Bug 231178 - Very slow IO with small block size
Summary: Very slow IO with small block size
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-arm (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-05 16:29 UTC by Robert David
Modified: 2018-10-13 19:18 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert David 2018-09-05 16:29:43 UTC
On raspberry pi3 aarch64, I have noticed very slow IO on any type of storage device (USB,mmc) when the block size is slow.

Example execution of dd shows huge difference (>10x)



pi-backup ~ # dd if=/dev/mmcsd0 of=/dev/zero bs=1M
3781+1 records in
3781+1 records out
3965190144 bytes transferred in 226.929478 secs (17473226 bytes/sec)

pi-backup ~ # dd if=/dev/mmcsd0 of=/dev/zero bs=512
^C981585+0 records in
981585+0 records out
502571520 bytes transferred in 375.254570 secs (1339282 bytes/sec)

(Note I have stopped that, because it would take >half hour to read 4GB sd card)

Similar results can be seen on USB hdd. This practically make speed of any fs (UFS/ZFS) <1MB/s.


For comparison this is results from Linux 4.14

rpi64 ~ # dd if=/dev/mmcblk0 of=/dev/zero bs=512
7744512+0 records in
7744512+0 records out
3965190144 bytes (4.0 GB, 3.7 GiB) copied, 178.385 s, 22.2 MB/s

rpi64 ~ # dd if=/dev/mmcblk0 of=/dev/zero bs=1M
3781+1 records in
3781+1 records out
3965190144 bytes (4.0 GB, 3.7 GiB) copied, 168.359 s, 23.6 MB/s


I do not expect FreeBSD be easily on par with Linux, but on the other side there seems to be huge bottleneck/contention somewhere.

Since it is not driver specific (USB/mmc), I expect something in core kernel not optimized yet.
Comment 1 Robert David 2018-09-05 16:31:10 UTC
Forgot to mention this is 12-alpha4 

pi-backup ~ # uname -a
FreeBSD pi-backup 12.0-ALPHA4 FreeBSD 12.0-ALPHA4 #0 r338410: Fri Aug 31 18:11:42 UTC 2018     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64
Comment 2 Warner Losh freebsd_committer freebsd_triage 2018-09-05 16:34:17 UTC
Do the numbers change if you use /dev/null instead of /dev/zero?
Comment 3 Robert David 2018-09-05 16:36:02 UTC
Sorry copy paste error.

No it does not change.
Comment 4 Ian Lepore freebsd_committer freebsd_triage 2018-09-05 16:38:33 UTC
This is the normal and expected behavior of an sdcard.  In freebsd, dd bypasses any buffering/caching in the kernel and reads/writes the device directly.  Doing tiny writes on an sdcard is slow because each 512b write potentially requires a the card to do read-erase-write internally of a 4 or 8MB block.  Tiny reads avoid the erase-block overhead, but still bypass any caching or readahead logic.

Perhaps in linux dd does not bypass the caching layers of the kernel.

The assertion that doing 512b IO affects filesystem performance is not correct; the filesystems does not do 512b IO.  To test filesystem performance with dd correctly, use a file rather than a raw device as if= or of=.
Comment 5 Robert David 2018-09-05 16:46:48 UTC
I was tracking very slow performance of UFS/ZFS on rpi comparing to Linux. 

I know that dd is not a good benchmarking tool. I also know that fs do not run on 512kb reads, I wanted to test corner cases. When trying to make some tracing I have hit panics with dtrace.

Anyway for example zfs scrub of usb HDD runs about 10-20MB/s on Linux. On Freebsd I'm on 1-2MB/s.
Comment 6 Ronald Klop 2018-09-05 18:44:35 UTC
Repeating the test on SSD via USB-SATA on rpi3b+.

dd with if=/dev/da0 gives the same results.

But using a regular file gives full USB2 speed.

dd if=/dev/random of=/blabla bs=1m count=10000
# file larger than memory to flush file cache.

dd if=/blabla of=/dev/null bs=512
^C
4561797632 bytes transferred in 211.879646 secs (21530136 bytes/sec)

dd if=/blabla of=/dev/null bs=1m
^C
1343225856 bytes transferred in 85.964873 secs (15625288 bytes/sec)
# even slower than bs=512

If I remember correctly Linux uses a cache on the block device so reading the raw device reads from cache also. FreeBSD uses a cache in the FS layer, so the raw device is uncached. So what you see is by design. If you want to use the raw device and have speed, implement a layer of cache.

I did not try to repeat the claim about ZFS scrub on my rpi3b+, but my amd64 desktop with SATA disks does this on full speed of the disk.
Comment 7 Ian Lepore freebsd_committer freebsd_triage 2018-09-05 18:49:22 UTC
(In reply to Ronald Klop from comment #6)

You can't reliably flush the filesystem cache by trying to force some amount of random data through it.  A good way to remove cache effects from such testing is to format up a filesystem on the device just for testing, then umount/remount it between each test.  When doing writing tests, the time it takes to do the umount can be added to the time of the dd/whatever that did the writing, because it must synchronously flush all data to disk before the unmount returns.
Comment 8 Robert David 2018-09-06 08:44:14 UTC
Thanks for info, I did not know Linux uses cache directly in block devices.

I will try to do some better tests and share results. Anyway my main reason for the testing was to track much poorer performance of USB->sata jbod (magnetic hdd's) with zfs. I use rpi as backup zfs storage. Do not care much about huge speed, but when I compared my Linux+zfs with FreeBSD+zfs I had times slower setup. If it were 10-20% I would probably did not notice. On Linux I'm saturating the 100M ethernet during zfs send/receive (and is acting as bottleneck), on rpi3b+ the speed would probably be much higher.

Enabling ZFS prefetch help a little.
Comment 9 Robert David 2018-09-19 15:54:55 UTC
Results from filesystems read:

## freebsd
# ufs
~ # dd if=/mnt/testfile of=/dev/null
1358350+0 records in
1358350+0 records out
695475200 bytes transferred in 42.214574 secs (16474765 bytes/sec)

# zfs
~ # dd if=/test2/testfile of=/dev/null
1358350+0 records in
1358350+0 records out
695475200 bytes transferred in 112.644525 secs (6174070 bytes/sec)


## linux
# zfs
~ # dd if=/test2/testfile of=/dev/null
1358350+0 records in
1358350+0 records out
695475200 bytes (695 MB, 663 MiB) copied, 29.4545 s, 23.6 MB/s

# ext4
~ # dd if=/mnt/testfile of=/dev/null 
1358350+0 records in
1358350+0 records out
695475200 bytes (695 MB, 663 MiB) copied, 18.3759 s, 37.8 MB/s


There is no valid UFS test for linux (dont have ufs driver compiled) and same the ext4 on freebsd (both only for comparison to zfs). I expected zfs to be much slower than simpler filesystems. It is all the time the same file, and the same pool test2 is imported in both systems.

Pool test2 is simple one disk (160GB rotational 2.5"), attached to USB2 to sata jbod (4x 2.5).

ugen0.4: <Sunplus Technology Inc. USB to Serial-ATA bridge> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (2mA)


I'm little bit disappointed with ZFS results on freebsd. If it would be hitting >10MB/s it would be fine.