| Summary: | L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme | ||
|---|---|---|---|
| Product: | Base System | Reporter: | braddeicide |
| Component: | kern | Assignee: | Andriy Gapon <avg> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | CC: | braddeicide, fs |
| Priority: | --- | ||
| Version: | 10.3-RELEASE | ||
| Hardware: | amd64 | ||
| OS: | Any | ||
(In reply to braddeicide from comment #0) I do not know why that third party script says that L2ARC is "degraded". I do not see any of the OS tools reporting any degradation. (In reply to Andriy Gapon from comment #1) Actually, sorry, I now see that you have highlighted kstat.zfs.misc.arcstats.l2_writes_error counter. It would be interesting to see what errors are reported by the following dtrace oneliner:
dtrace -n 'sdt:::l2arc-iodone /args[0]->io_error != 0/ { printf("io_error = %d", args[0]->io_error); }'
Yes I was suspicious of the script when I removed and readded the device and it worked for a while, but the script kept saying degraded. How I can tell it's gone wrong again however is in zpool iostat -v when all data is dropped from the arc and capacity stays 100% free despite writes.
# dtrace -n 'sdt:::l2arc-iodone /args[0]->io_error != 0/ { printf("io_error = %d", args[0]->io_error); }'
dtrace: description 'sdt:::l2arc-iodone ' matched 1 probe
dtrace: buffer size lowered to 512k
CPU ID FUNCTION:NAME
0 57400 none:l2arc-iodone io_error = 22
3 57400 none:l2arc-iodone io_error = 22
1 57400 none:l2arc-iodone io_error = 22
1 57400 none:l2arc-iodone io_error = 22
1 57400 none:l2arc-iodone io_error = 22
3 57400 none:l2arc-iodone io_error = 22
3 57400 none:l2arc-iodone io_error = 22
3 57400 none:l2arc-iodone io_error = 22
1 57400 none:l2arc-iodone io_error = 22
3 57400 none:l2arc-iodone io_error = 22
0 57400 none:l2arc-iodone io_error = 22
1 57400 none:l2arc-iodone io_error = 22
0 57400 none:l2arc-iodone io_error = 22
2 57400 none:l2arc-iodone io_error = 22
(In reply to braddeicide from comment #4) 22 is EINVAL, I wonder from where it comes. Based on the device names it seems that GELI is involved here? Perhaps it could be some problem with it... Anyway, could you please add printing of the following values to the dtrace script? args[0]->io_offset, args[0]->io_size Also, diskinfo -v /dev/nvd0p3.eli Thank you. BTW, don't want to jump to conclusion but it's possible that r300039 might help. Yes it's on a geli
dtrace -n 'sdt:::l2arc-iodone /args[0]->io_error != 0/ { printf("io_error = %d io_offset = %d io_size = %d", args[0]->io_error, args[0]->io_offset, args[0]->io_size); }'
dtrace: description 'sdt:::l2arc-iodone ' matched 1 probe
dtrace: buffer size lowered to 512k
CPU ID FUNCTION:NAME
2 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
1 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
0 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
1 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
0 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
1 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
0 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
1 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
2 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
0 57400 none:l2arc-iodone io_error = 22 io_offset = 0 io_size = 0
# diskinfo -v /dev/nvd0p3.eli
/dev/nvd0p3.eli
4096 # sectorsize
483117756416 # mediasize in bytes (450G)
117948671 # mediasize in sectors
0 # stripesize
0 # stripeoffset
7341 # Cylinders according to firmware.
255 # Heads according to firmware.
63 # Sectors according to firmware.
S2GMNCAGA01093W # Disk ident.
Oh 4k, this device is pretty deadset on being 512
# diskinfo -v /dev/nvd0p3
/dev/nvd0p3
512 # sectorsize
483117760512 # mediasize in bytes (450G)
943589376 # mediasize in sectors
512 # stripesize
0 # stripeoffset
58735 # Cylinders according to firmware.
255 # Heads according to firmware.
63 # Sectors according to firmware.
S2GMNCAGA01093W # Disk ident.
I dropped the cache and the dtrace errors stopped so that's the source alright. I detached the geli, attached the geli and added cache and they started again.
Dropped the cache, detached geli, added nvd0p3 as cache without geli and no errors and the cache started building.
capacity operations bandwidth
pool alloc free read write read write
cache - - - - - -
nvd0p3 514M 449G 0 0 0 0
I recreated geli with 512 Sectorsize, reattached and so far no errors, but it worked the first time for a while too. Cache is building, no dtrace errors yet, i'll watch.
# diskinfo -v /dev/nvd0p3.eli
/dev/nvd0p3.eli
512 # sectorsize
483117760000 # mediasize in bytes (450G)
943589375 # mediasize in sectors
0 # stripesize
0 # stripeoffset
58735 # Cylinders according to firmware.
255 # Heads according to firmware.
63 # Sectors according to firmware.
S2GMNCAGA01093W # Disk ident.
re r300039, yeah looks useful
------------------------------------------------------------------------
r300039 | avg | 2016-05-17 18:43:50 +1000 (Tue, 17 May 2016) | 5 lines
MFC r297848: l2arc: make sure that all writes honor ashift of a cache device
Note: no MFC stable/9 because it has become quite out of date with head,
so the merge would be quite labourious and, thus, risky.
------------------------------------------------------------------------
(In reply to braddeicide from comment #7) > io_error = 22 io_offset = 0 io_size = 0 This looks really strange. I am not sure why L2ARC code would issue such zio-s. Ah, it's actually a parent zio of all zio-s for individual buffers written in a pass. If a single child zio fails, then the parent zio is marked as failed too. And another one-liner:
dtrace -n 'fbt::g_io_deliver:entry /args[1] == EINVAL/ { printf("dest: %s, offset %d, length %d", stringof(args[0]->bio_to->name), args[0]->bio_offset, args[0]->bio_length); }'
This hasn't reoccurred since I changed the geli sector size to 512 to match the underlying device. Would you like me to return to the failing configuration so we can find a better solution? (In reply to braddeicide from comment #10) If you could do that and then pick up and test r300039 that would be great! Been running r300039 in previously problematic mismatched sector size configuration for a week, looks good.
# geli is 4k, underlying device is 512
diskinfo -v /dev/nvd0p3.eli | grep sectorsize
4096 # sectorsize
diskinfo -v /dev/nvd0p3 | grep sectorsize
512 # sectorsize
# zfs-stats
L2 ARC Summary: (HEALTHY)
# Cache is growing
pool alloc free read write read write
------------- ----- ----- ----- ----- ----- -----
cache - - - - - -
nvd0p3.eli 139G 311G 3 3 40.0K 427K
# guess compress_failures weren't a valid indicator
kstat.zfs.misc.arcstats.l2_compress_failures: 11644501
kstat.zfs.misc.arcstats.l2_writes_error: 0
(In reply to braddeicide from comment #12) "Compress failure" only means that the compression didn't save any space and thus a buffer eligible for the compression was placed in L2ARC without it. Assuming that the problem was fixed by https://svnweb.freebsd.org/base?view=revision&revision=297848 |
When placing an l2arc on an Samsung SSD 950 Pro (nvme) it works for a while (hours/day?) then it's marked as degraded and capacity is all unallocated. I'm using the same device for logs and it appears to be functioning fine (I know not best practice) Unlike other similar ticket #198242 I don't get bad checksums or io errors. #zfs-stats -L ------------------------------------------------------------------------ ZFS Subsystem Report Tue Jul 26 20:54:51 2016 ------------------------------------------------------------------------ L2 ARC Summary: (DEGRADED) Passed Headroom: 1.05m Tried Lock Failures: 1.68m IO In Progress: 268 Low Memory Aborts: 114 Free on Write: 647.02k Writes While Full: 1.42m R/W Clashes: 2 Bad Checksums: 0 IO Errors: 0 SPA Mismatch: 1.38k L2 ARC Size: (Adaptive) 298.83 MiB Header Size: 0.00% 0 L2 ARC Evicts: Lock Retries: 37 Upon Reading: 0 L2 ARC Breakdown: 44.69m Hit Ratio: 5.07% 2.27m Miss Ratio: 94.93% 42.43m Feeds: 1.70m L2 ARC Buffer: Bytes Scanned: 114.97 TiB Buffer Iterations: 1.70m List Iterations: 3.13m NULL List Iterations: 58.42k L2 ARC Writes: Writes Sent: 100.00% 1.54m # sysctl -a | grep l2 kern.cam.ctl2cam.max_sense: 252 vfs.zfs.l2c_only_size: 0 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 0 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 52428800 vfs.zfs.l2arc_write_max: 26214400 vfs.cache.numfullpathfail2: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 58416 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 3128016 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 126433859595264 kstat.zfs.misc.arcstats.l2_write_pios: 1543758 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 1705468 kstat.zfs.misc.arcstats.l2_write_full: 1424118 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 1229880739 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 268 kstat.zfs.misc.arcstats.l2_write_in_l2: 4397896253 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 1379 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 1047813 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 1676946 -> kstat.zfs.misc.arcstats.l2_compress_failures: 143662342 kstat.zfs.misc.arcstats.l2_compress_zeros: 11159 kstat.zfs.misc.arcstats.l2_compress_successes: 1038776676 kstat.zfs.misc.arcstats.l2_hdr_size: 0 kstat.zfs.misc.arcstats.l2_asize: 152834048 kstat.zfs.misc.arcstats.l2_size: 338656768 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 114 kstat.zfs.misc.arcstats.l2_cdata_free_on_write: 4449 kstat.zfs.misc.arcstats.l2_free_on_write: 647030 kstat.zfs.misc.arcstats.l2_evict_l1cached: 3164010 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 37 kstat.zfs.misc.arcstats.l2_writes_lock_retry: 2181 -> kstat.zfs.misc.arcstats.l2_writes_error: 1499766 kstat.zfs.misc.arcstats.l2_writes_done: 1543758 kstat.zfs.misc.arcstats.l2_writes_sent: 1543758 kstat.zfs.misc.arcstats.l2_write_bytes: 17569594126336 kstat.zfs.misc.arcstats.l2_read_bytes: 59325997056 kstat.zfs.misc.arcstats.l2_rw_clash: 2 kstat.zfs.misc.arcstats.l2_feeds: 1705468 kstat.zfs.misc.arcstats.l2_misses: 42457839 kstat.zfs.misc.arcstats.l2_hits: 2265463 kstat.zfs.misc.arcstats.evict_l2_skip: 712 kstat.zfs.misc.arcstats.evict_l2_ineligible: 214781087744 kstat.zfs.misc.arcstats.evict_l2_eligible: 1461307185152 kstat.zfs.misc.arcstats.evict_l2_cached: 250679670784 # zpool status pool: zpool1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 0 in 30h28m with 0 errors on Wed Jul 20 04:06:39 2016 config: NAME STATE READ WRITE CKSUM zpool1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p2.eli ONLINE 0 0 0 da1p2.eli ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da2p2.eli ONLINE 0 0 0 da3p2.eli ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 da6p2.eli ONLINE 0 0 0 da7p2.eli ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 da4p2.eli ONLINE 0 0 0 da5p2.eli ONLINE 0 0 0 logs nvd0p2.eli ONLINE 0 0 0 cache nvd0p3.eli ONLINE 0 1.21G 0 errors: No known data errors When I remove and re-add the device it works for a while, but then gets in the state seen below where its capacity is entirely free. # zpool iostat -v capacity operations bandwidth pool alloc free read write read write ------------- ----- ----- ----- ----- ----- ----- zpool1 11.6T 2.91T 625 186 11.6M 3.58M mirror 2.70T 945G 137 51 2.76M 727K da0p2.eli - - 39 15 2.78M 730K da1p2.eli - - 39 15 2.78M 730K mirror 2.86T 784G 183 34 2.86M 515K da2p2.eli - - 43 12 2.89M 517K da3p2.eli - - 43 11 2.89M 517K mirror 3.23T 406G 217 28 3.17M 452K da6p2.eli - - 54 10 3.27M 455K da7p2.eli - - 53 10 3.27M 455K mirror 2.80T 846G 87 45 2.80M 691K da4p2.eli - - 31 13 2.80M 694K da5p2.eli - - 31 13 2.80M 694K logs - - - - - - nvd0p2.eli 14.1M 6.92G 0 34 0 1.60M cache - - - - - - nvd0p3.eli 146M 450G 126 263 1.05M 12.7M ------------- ----- ----- ----- ----- ----- -----