Bug 244356 - Writing to a USB 3.0 stick is very slow
Summary: Writing to a USB 3.0 stick is very slow
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-usb mailing list
URL:
Keywords: needs-qa, performance
Depends on:
Blocks:
 
Reported: 2020-02-24 01:20 UTC by Sebastien Boisvert
Modified: 2020-03-13 11:13 UTC (History)
4 users (show)

See Also:


Attachments
Output of usbdump for the USB 2.0 stick (which works fine) (4.08 KB, application/gzip)
2020-02-24 12:26 UTC, Sebastien Boisvert
no flags Details
Output of usbdump for the USB 3.0 stick (which is 10x times slower than the USB 2.0 stick) (748 bytes, application/gzip)
2020-02-24 12:27 UTC, Sebastien Boisvert
no flags Details
Output of dmesg with hw.usb.umass.debug=-1 (3.41 KB, application/gzip)
2020-02-24 22:41 UTC, Sebastien Boisvert
no flags Details
Output of dmesg with hw.usb.umass.debug=-1 with 2 Kingston DataTraveler 2.0 (3.21 KB, application/gzip)
2020-02-24 23:06 UTC, Sebastien Boisvert
no flags Details
dmesg with umass debugging and no USB devices plugged in (3.47 KB, application/gzip)
2020-02-25 00:55 UTC, Sebastien Boisvert
no flags Details
dmesg with umass debugging and Kingston DataTraveler 3.0 plugged in (USB 3.0 device in a USB 3.0 port) (5.04 KB, application/gzip)
2020-02-25 00:59 UTC, Sebastien Boisvert
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastien Boisvert 2020-02-24 01:20:31 UTC
Hello,

Writing to a USB 3.0 stick is very slow. USB 2.0 sticks work fine.
USB 3.0 stick plugged in a USB 2.0 port exhibit the same problem
compared to being plugged in a USB 3.0 port.

My FreeBSD installation is on a ZFS mirror of 2 identical 7.2 GiB USB 2.0 sticks.

Output of 'usbconfig' for these 2 storage devices:

ugen1.2: <Kingston DataTraveler 2.0> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)
ugen0.3: <Kingston DataTraveler 2.0> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)

Output of gpart:

root@atlantis:~ # gpart show da0
=>      40  15131560  da0  GPT  (7.2G)
        40      1024    1  freebsd-boot  (512K)
      1064       984       - free -  (492K)
      2048   4194304    2  freebsd-swap  (2.0G)
   4196352  10934272    3  freebsd-zfs  (5.2G)
  15130624       976       - free -  (488K)

root@atlantis:~ # gpart show da1
=>      40  15131560  da1  GPT  (7.2G)
        40      1024    1  freebsd-boot  (512K)
      1064       984       - free -  (492K)
      2048   4194304    2  freebsd-swap  (2.0G)
   4196352  10934272    3  freebsd-zfs  (5.2G)
  15130624       976       - free -  (488K)


The directory /usr contains 620 MB of files.
Command:
# du -sh /usr/
620M	/usr/


It takes 4:31.4 (4 minutes, 31 seconds) to copy /usr to /root.
Command: time rsync -av /usr /root/

sent 1,458,070,436 bytes  received 952,593 bytes  5,354,212.95 bytes/sec
total size is 1,454,334,520  speedup is 1.00
13.724u 17.240s 4:31.94 11.3%	433+221k 59891+57303io 0pf+0w

I have a 32 GB USB 3.0 stick. The output of 'usbconfig' is:

ugen0.5: <Kingston DataTraveler 3.0> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (126mA)

The output of 'gpart':

root@atlantis:~ # gpart show da2
=>      40  60604336  da2  GPT  (29G)
        40      2008       - free -  (1.0M)
      2048  60600320    1  freebsd-ufs  (29G)
  60602368      2008       - free -  (1.0M)

There is a UFS file system on the USB 3.0 devive.
It is mounted in /mnt/ufs

It takes 31:34.57 to copy /usr to /mnt/ufs,
Command: time rsync -av /usr/ /mnt/ufs/

sent 1,458,070,404 bytes  received 952,532 bytes  770,136.15 bytes/sec
total size is 1,454,334,520  speedup is 1.00
13.128u 12.211s 31:34.57 1.3%	435+222k 7678+9104io 0pf+0w

Using 'truss', I can see that a lot of calls to select() take a lot of time
before returning.

Command: 
 truss -D rsync -av /usr /mnt/ufs/ >& /root/rsync-truss-ufs-usb-3.log

root@atlantis:~ # sort -k1,1rn /root/rsync-truss-ufs-usb-3.log | head -n 20
60.003431924 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 0 (0x0)
49.165728175 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
42.159206222 select(5,{ },{ 4 },{ },{ 60.000000 }) = 1 (0x1)
41.122681159 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
39.143298378 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
38.106784375 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
38.080823335 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
33.890768013 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
32.247561086 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
29.721087531 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
28.045373886 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
27.097321654 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
26.624814418 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
26.183747112 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
23.386795589 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
22.540706824 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
22.448082140 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
21.807824638 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
21.662166794 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)
17.910288782 select(6,{ 5 },{ 4 },{ 5 },{ 60.000000 }) = 1 (0x1)


Using the 'procsystime' dtrace script, I can see the numerous calls to
select().

root@atlantis:~ # /usr/local/share/dtrace-toolkit/procsystime
Tracing... Hit Ctrl-C to end...
dtrace: 357 dynamic variable drops with non-empty dirty list
^C

Elapsed Times for all processes,

         SYSCALL          TIME (ns)
       sigreturn               4899
            open               5961
         fstatfs               6843
           close               8286
          openat              13816
           fstat              17242
       sigaction              22701
            mmap              30375
         fstatat              39743
   getdirentries              40936
     sigprocmask              99694
           write             633885
        __sysctl             873677
           ioctl            1227846
            read            1428213
        _umtx_op         9038575168
          select        17245685291


The 'hotkernel' dtrace script shows that the CPU cores are mostly idle.

root@atlantis:~ # /usr/local/share/dtrace-toolkit/hotkernel 
Sampling... Hit Ctrl-C to end.
^C
FUNCTION                                                COUNT   PCNT
kernel`cpu_activeclock                                      1   0.0%
kernel`acpi_cpu_idle                                        1   0.0%
kernel`usbd_in_polling_mode                                 1   0.0%
kernel`xhci_generic_done_sub                                1   0.0%
kernel`critical_exit_preempt                                1   0.0%
kernel`softdep_sync_buf                                     1   0.0%
kernel`rdtsc                                                1   0.0%
kernel`cpu_idle_acpi                                        2   0.0%
kernel`ithread_loop                                         2   0.0%
kernel`usbd_get_page                                        2   0.0%
kernel`xhci_setup_generic_chain_sub                         3   0.0%
kernel`em_update_stats_counters                             3   0.0%
kernel`tsc_get_timecount_low_mfence                         4   0.0%
kernel`cpu_search_highest                                   7   0.0%
kernel`xhci_interrupt_poll                                  8   0.0%
kernel`xhci_interrupt                                      16   0.0%
kernel`spinlock_exit                                       36   0.0%
kernel`cpu_idle                                          2888   3.4%
kernel`acpi_cpu_c1                                      81118  96.5%


The 'disklatency' dtrace script shows a high disk latency.

root@atlantis:~ # /usr/share/dtrace/disklatency
Tracing... Hit Ctrl-C to end.
^C

  da                                                        2
           value  ------------- Distribution ------------- count    
           76000 |                                         0        
        >= 80000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2        

  da                                                        2          2131761
  da                                                        2          2121442
  da                                                        2            10420


ZFS is completely unusable on this USB 3.0 stick.
It took the whole night just to install FreeBSD on ZFS using the installer.


Hardware
--------

- Processor: AMD Ryzen 5 2600 Six-Core Processor
- Socket: AM4
- Memory: Kingston 8GB DDR4-2400ECC Unbuffered DIMM CL171Rx81.2V Micron EDie (Server Premier)
- Motherboard: ASRock B450 Gaming-ITX/ac
- Chipset: AMD Promontory B450
- Storage (operating system): 2 x Kingston DataTraveler 2.0 7.2 GiB
- Storage: Kingston DataTraveler 3.0 PHISON  USB3
Comment 1 Hans Petter Selasky freebsd_committer 2020-02-24 08:07:55 UTC
You might want to check the latency using usbdump:

usbdump -i usbus1 -f 2 -s 65536

To see exactly what is going on.

--HPS
Comment 2 Sebastien Boisvert 2020-02-24 12:26:02 UTC
Created attachment 211891 [details]
Output of usbdump for the USB 2.0 stick (which works fine)
Comment 3 Sebastien Boisvert 2020-02-24 12:27:21 UTC
Created attachment 211892 [details]
Output of usbdump for the USB 3.0 stick (which is 10x times slower than the USB 2.0 stick)
Comment 4 Hans Petter Selasky freebsd_committer 2020-02-24 12:38:38 UTC
It looks like the SCSI/CAM layer is probing something every 3 seconds and doesn't get the expected result. It does not look like a device problem.

Can you try to set before plugging the device:

sysctl hw.usb.umass.debug=-1

And then collect messages in dmesg?

--HPS
Comment 5 Sebastien Boisvert 2020-02-24 12:45:09 UTC
(In reply to Hans Petter Selasky from comment #1)

Hello Hans Petter Selasky,

Note that one the USB 2.0 stick is on the bus usbus1 (as device ugen1.2) while
the other USB 2.0 stick is on the bus usbus0 (as device ugen0.3). The USB 3.0 stick is on the bus usbus0 (as device ugen0.5).

I ran 'usbdump' with those commands:

# usbdump -i usbus1 -f 2 -s 65536 > /root/Kingston_DataTraveler_2.0-usbus1-device-2-latency.txt

# usbdump -i usbus0 -f 5 -s 65536 > /root/Kingston_DataTraveler_3.0-usbus0-device-5-latency.txt

Here are the information your requested:

Kingston_DataTraveler_2.0-usbus1-device-2-latency.txt.gz
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=211891

Kingston_DataTraveler_3.0-usbus0-device-5-latency.txt.gz
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=211892
Comment 6 Sebastien Boisvert 2020-02-24 22:41:25 UTC
Created attachment 211911 [details]
Output of dmesg with hw.usb.umass.debug=-1
Comment 7 Hans Petter Selasky freebsd_committer 2020-02-24 23:01:12 UTC
Is this output the same for both USB 2.0 and USB 3.0?

Warner:

Why is the data transfer 8K for a sense length of 252 bytes?

ccb->csio.dxfer_len = 8192, ccb->csio.sense_len = 252

umass1:umass_cam_action: 6:0:0:XPT_SCSI_IO: cmd: 0x2a, flags: 0x80, 10b cmd/8192b data/252b sense

--HPS
Comment 8 Sebastien Boisvert 2020-02-24 23:06:50 UTC
Created attachment 211912 [details]
Output of dmesg with hw.usb.umass.debug=-1 with 2 Kingston DataTraveler 2.0

USB devices plugged in (no USB 3.0):

ugen1.1: <0x1022 XHCI root HUB> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen0.1: <0x1022 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen0.2: <vendor 0x8087 product 0x0aa7> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
ugen1.2: <Kingston DataTraveler 2.0> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)
ugen0.3: <Kingston DataTraveler 2.0> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)
ugen0.4: <Chicony USB Keyboard> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (100mA)
Comment 9 Hans Petter Selasky freebsd_committer 2020-02-24 23:26:42 UTC
ugen1.2: <Kingston DataTraveler 2.0> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)
ugen0.3: <Kingston DataTraveler 2.0> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (100mA)

In the trace I see umass0 and umass1. Can you point out which is which?

There are no errors in the attached traces, nor any timeouts. Everything appears to work like expected.

Warner: Nevermind. It is SCSI command 0x2a .
Comment 10 Hans Petter Selasky freebsd_committer 2020-02-24 23:27:09 UTC
In the trace both devices are enumerating like USB2.0 ???
Comment 11 Hans Petter Selasky freebsd_committer 2020-02-24 23:30:46 UTC
One question: Can you re-do the test while disabling the swap partition?

--HPS
Comment 12 Sebastien Boisvert 2020-02-25 00:52:33 UTC
Hello Hans,

I have 2 spare Seagate IronWolf NAS 4 TB SATA3.
(I am waiting for a case and a power supply to arrive by mail in order
to complete the build.)

To simplify things, I installed FreeBSD 12.1-RELEASE on a IronWolf SATA3 disk.

This way, the swap partition on the USB 2.0 stick will not interfere.

I will give you 2 dmesg outputs:

- dmesg without any USB device being attached.
- dmesg with only the <Kingston DataTraveler 3.0> 32 GB USB stick plugged in.
Comment 13 Sebastien Boisvert 2020-02-25 00:55:56 UTC
Created attachment 211914 [details]
dmesg with umass debugging and no USB devices plugged in

### Output of 'usbconfig'
ugen1.1: <0x1022 XHCI root HUB> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen0.1: <0x1022 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen0.2: <vendor 0x8087 product 0x0aa7> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
Comment 14 Sebastien Boisvert 2020-02-25 00:59:48 UTC
Created attachment 211915 [details]
dmesg with umass debugging and Kingston DataTraveler 3.0 plugged in (USB 3.0 device in a USB 3.0 port)

### Output of 'usbconfig'
ugen1.1: <0x1022 XHCI root HUB> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen0.1: <0x1022 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen0.2: <vendor 0x8087 product 0x0aa7> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
ugen0.3: <Kingston DataTraveler 3.0> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (126mA)
Comment 15 Sebastien Boisvert 2020-02-27 02:07:43 UTC
I tested the <Kingston DataTraveler 3.0> USB stick on Ubuntu 18.04 (Linux)
on the same system (AMD Ryzen 5 2600, SRock B450 Gaming-ITX/ac) to make sure
that the problem is not due to the hardware.

The command

    time rsync -av /lib /mnt/ext4-on-usb-3.0/

achieves a throughput of 66 MB/s.

So the hardware is not the problem.
Comment 16 vidwer+fbsdbugs 2020-03-01 15:45:59 UTC
I'll be following this.

With the USB DA device from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235897 dd and rsync transfers to UFS and to ZFS seem to stall, after a while.
A 120-something transfer speed of a large generated file (dd if=/dev/urandom of=foo bs=1M count=1048576) stalls, drops to 10-something transfer speed, stalls again and drops to KB/s speeds. After interrupting rsync, I ran 'time sync', which takes 'real 2m15.632s'.

Because of the dd observation, perhaps 'disk cache' can't keep up with the amount of writes?

rsync to UFS on USB 2.0 storage:
sent 1,074,155,802 bytes  received 76 bytes  9,633,685.00 bytes/sec

rsync to UFS on USB 3.0 storage:
sent 1,074,155,802 bytes  received 76 bytes  2,438,492.35 bytes/sec

I can test patches on 13-CURRENT.
Comment 17 Sebastien Boisvert 2020-03-01 15:57:19 UTC
I ended up using 2 x Kingston DataTraveler 2.0 sticks for my installation of FreeBSD 12.1-RELEASE (ZFS mirror).

I also purchased 2 x USB 3.0 Sandisk sticks instead of Kingston to see if it's a vendor problem. I have not yet received those.

I do have an older Kingston USB 3.0 stick, and it is also unusable like the 2 newer Kingston USB 3.0 sticks.

In an experiment, I tried to install OmniOS r151030a on the same 2 x Kingston DataTraveler 3.0. Like with FreeBSD 12.1-RELEASE, OmniOS r151030a takes forever to install itself on USB 3.0 sticks (ZFS mirror).

In another experiment, I installed OmniOS r151030a on the 2 x Kingston DataTraveler USB 2.0 sticks. The installation went fine, but then OmniOS hangs when booting off those USB 2.0 sticks.

So, I am happy that FreeBSD 12.1-RELEASE works with USB 2.0. USB 3.0 would be great though.
Comment 18 Hans Petter Selasky freebsd_committer 2020-03-01 15:58:13 UTC
Hi,

FreeBSD doesn't have a disk cache.

Try to run usbdump at the same time as testing the disk using dd.

--HPS
Comment 19 Hans Petter Selasky freebsd_committer 2020-03-01 15:59:23 UTC
Also try:

/usr/src/tools/tools/usbtest

Make sure your memory stick is empty before running usbtest.

--HPS
Comment 20 Sebastien Boisvert 2020-03-01 16:33:50 UTC
(In reply to Hans Petter Selasky from comment #18)

> FreeBSD doesn't have a disk cache.

I don't know much about FreeBSD so maybe I am wrong, but:

In FreeBSD, ZFS ARC (adaptive replacement cache) is a disk cache for read operations for ZFS.

So, for read operations, if I have ZFS on the USB stick, then there is a disk cache for read operations through the ZFS ARC, right ?

Although that there are no disk cache for write operations, FreeBSD probably
has a queue (or queues) in the block subsystem for AHCI I/O operations (with SATA 3 for instance). This logical queue can be used to coalesce and/or order I/O commands before draining this queue to submit to the device queue.

I don't know if USB 3.0 sticks are mono-queue or multi-queue.
I know that SATA has 1 queue with a depth of 32 whereas NVMe has 64k queues with each a depth of 64k.

I know that Linux has a page cache (unified for the virtual memory and block devices and so on).
Do you think that USB 3.0 works well on Linux because of the page cache that is used for write operations ?

I mean, if userspace does not call "sync", then the data can be dirty in kernel space without being written back to the device. That usually makes I/O faster.

The documentation of "select" says:

    "to see if some of their descriptors are ready for reading,
     are ready for writing"
     --> https://www.freebsd.org/cgi/man.cgi?select

Here, select takes 60 seconds to conclude that none of the descriptors are ready. That delay is clearly abnormal.

How can I trace what is being called between the select_ENTER and select_EXIT in FreeBSD ?
Comment 21 Hans Petter Selasky freebsd_committer 2020-03-01 16:38:28 UTC
Cache only for files, not disk writes/reads.

--HPS
Comment 22 Sebastien Boisvert 2020-03-01 17:12:06 UTC
(In reply to Hans Petter Selasky from comment #21)

OK got it.

How can I trace what is being called between the select_ENTER and select_EXIT in FreeBSD ?
Comment 23 Sebastien Boisvert 2020-03-13 11:13:00 UTC
Hello,

I acquired new USB flash drives to do additional tests.

I used the same USB 3.1 Gen1 port on my computer case for all tested USB drives.
I created one GPT partition and formatted a UFS file system on it.
The benchmark is the following command that copies /usr (size: 848M):

time rsync -av /usr /mnt/USB_FLASH_DRIVE/

------------------------------------------------------------------------
Product: Kingston DataTraveler SE9 G2 USB Flash Drive 
Part Number: DTSE9G2/32GB
Interface: USB 3.0

time:
19.606u 19.922s 1:29:45.25 0.7% 433+221k 5552+228702io 0pf+0w

rsync:
sent 2,069,862,956 bytes  received 780,403 bytes  384,484.89 bytes/sec
total size is 2,066,495,106  speedup is 1.00


------------------------------------------------------------------------
Product: SanDisk Ultra Fit USB 3.1 Flash Drive 
Part Number: SDCZ430-032G-G46
Interface: USB 3.1

time:
20.155u 22.475s 19:16.30 3.6%	431+220k 19641+353403io 0pf+0w

rsync:
sent 2,069,862,956 bytes  received 780,443 bytes  1,790,439.60 bytes/sec
total size is 2,066,495,106  speedup is 1.00


------------------------------------------------------------------------
Product: SanDisk Cruzer Pop 32GB USB 2.0 Flash Drive, Tribal
Part Number: SDCZ53B-032G-B35
Interface: USB 2.0

time:
19.108u 20.545s 1:31:48.85 0.7% 438+224k 14194+228758io 0pf+0w

rsync:

sent 2,069,862,956 bytes  received 780,431 bytes  375,831.45 bytes/sec
total size is 2,066,495,106  speedup is 1.00