Bug 258204 - Extremely poor performance copying directory with lot of small files on nvme
Summary: Extremely poor performance copying directory with lot of small files on nvme
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-01 21:17 UTC by Ali Abdallah
Modified: 2021-10-22 15:31 UTC (History)
0 users

See Also:


Attachments
pciconf -lvb (11.68 KB, text/plain)
2021-09-01 21:17 UTC, Ali Abdallah
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ali Abdallah 2021-09-01 21:17:06 UTC
Copying over a folder with lot of small files takes a lot of time

time cp -r /usr/src/sys /home/ali/sys
173.58 real         0.00 user         0.90 sys

/dev/gpt/freebsd-root    /        ufs     rw,noatime 2    2
/dev/gpt/freebsd-boot    /bootfs  ufs     rw         1    1
/dev/gpt/freebsd-swap    none     swap    sw         0    0 

/usr/src/sys contains ~500M of data. 

The problematic system is a T495 with nvme device. On the same machine, I tested copying the same amount of data on OpenSUSE 15.3 with xfs root partition, and it takes around 1.5 seconds. 

While copying, I see that the 'cp' command spends a lot of time on biord/biowr.

I did a test copying a single large file of 500M, and it impressively fast. 

Please let me know if you need more data.
Comment 1 Ali Abdallah 2021-09-01 21:17:44 UTC
Created attachment 227600 [details]
pciconf -lvb
Comment 2 Ali Abdallah 2021-09-02 07:25:36 UTC
# truss -c cp -r /usr/src/sys .
syscall                     seconds   calls  errors
madvise                 0.000116236      11       0
copy_file_range         0.026481268       7       0
__sysctlbyname          0.000026349       1       0
write                  94.863922672   25461       0
openat                148.953887368   50930       0
mkdir                   7.676342967    2945       0
umask                   0.000010470       2       0
fstatat                 0.421467513   56822   28411
sigaction               0.000006072       1       0
__sysctl                0.000006882       1       0
readlink                0.000006863       1       1
munmap                  0.167493547   25462       0
fstatfs                 0.000006633       1       0
pread                   0.000005861       1       0
getdirentries           0.039877899    5965       0
fcntl                   0.015446983    2946       0
close                   0.341184094   53880       0
read                    0.000026979       3       0
fstat                   0.000028032       4       0
open                    0.022203466    2950       0
sigfastblock            0.000009748       1       0
issetugid               0.000010801       2       0
mprotect                0.000027954       4       0
mmap                    0.195280501   25473       0
                      ------------- ------- -------
                      252.723877158  252874   28412

As you can see, a ridiculous amount of time is spend in openat and write system calls. On a different hardware but same FreeBSD version+UFS with sata ssd, the same copy of /usr/src/sys takes 2 seconds.
Comment 3 Ali Abdallah 2021-10-22 15:31:58 UTC
Some nvme perf test measurements 

# nvmecontrol perftest -n 1 -o read -s 4096 -t 30 nvme0ns1
Threads:  1 Size:   4096  READ Time:  30 IO/s:   34515 MB/s:  134

# nvmecontrol perftest -n 2 -o read -s 4096 -t 30 nvme0ns1
Threads:  2 Size:   4096  READ Time:  30 IO/s:   69473 MB/s:  271

diskinfo -tv /dev/nvd0p4.eli 
/dev/nda0p4.eli
	4096        	# sectorsize
	478888849408	# mediasize in bytes (446G)
	116916223   	# mediasize in sectors
	0           	# stripesize
	0           	# stripeoffset
	7277        	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	UMIS LENSE40512GMSP34MESTB3A	# Disk descr.
	SS0L25153X3RC03Y11N2	# Disk ident.
	nvme0       	# Attachment
	Yes         	# TRIM/UNMAP support
	0           	# Rotation rate in RPM

Seek times:
	Full stroke:	  250 iter in   0.059140 sec =    0.237 msec
	Half stroke:	  250 iter in   0.038418 sec =    0.154 msec
	Quarter stroke:	  500 iter in   0.075839 sec =    0.152 msec
	Short forward:	  400 iter in   0.059095 sec =    0.148 msec
	Short backward:	  400 iter in   0.076297 sec =    0.191 msec
	Seq outer:	 2048 iter in   0.220371 sec =    0.108 msec
	Seq inner:	 2048 iter in   0.312280 sec =    0.152 msec

Transfer rates:
	outside:       102400 kbytes in   1.339011 sec =    76474 kbytes/sec
	middle:        102400 kbytes in   1.125152 sec =    91010 kbytes/sec
	inside:        102400 kbytes in   1.394730 sec =    73419 kbytes/sec

The values look really bad for a NVME device.