Bug 232152 - unrecoverable state after zroot is filled with data to 100%, can't boot, can't import
Summary: unrecoverable state after zroot is filled with data to 100%, can't boot, can'...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-10 15:16 UTC by Petr Fischer
Modified: 2018-10-12 22:37 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Fischer 2018-10-10 15:16:47 UTC
My experience on virtual server:

I accidentally filled the zroot zpool with data to the 100% of capacity (available ZFS space exactly 0).

Then, it's not possible to boot, just because it's not possible to import zroot pool anymore.

When I booted from FreeBSD 11.1-RELEASE Install CD ROM (Live CD option), I can't import the zroot pool with: zpool import zroot
Command never ends, zpool process is in "tx->tx" state forever, CPU utilization is 0%, interrupts 0%, idle 100%.

When I try to import filled zroot in readonly mode, yes, that works! But in readonly mode, you can't delete data or snapshots.

So it looks like game over. Maybe ZFS is not able to do any commit records anymore on 100% filled pool, I don't know...

There is only 512MB (+ 1GB swap) on my virtual server, but 2 years without any problem with ZFS (hourly snapshots, zfs send for backup, everything works like a charm).

Is it really that dangerous to fill the zpool with data to 100%?

Such a situation will be simple to simulate:

1) install FreeBSD on the ZFS
2) fill some dataset with random data to the 100% of capacity
3) check with zfs command, if there is really 0 bytes available space
4) reboot
5) gameover
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2018-10-11 05:38:43 UTC
(In reply to Petr Fischer from comment #0)
ZFS is supposed to try really hard to reserve some space for its internal uses and, thus, to avoid 100% space utilization.
I wonder if there is more to this story than just writing too much data to the pool.
Comment 2 Petr Fischer 2018-10-11 09:13:50 UTC
More to this story: data was filled to 100% on zroot by freebsd-update (because I'm stupid and I have not given enough space before FreeBSD upgrade).

zroot pool was filled by "freebsd-update install" stage.
Comment 3 Petr Fischer 2018-10-11 16:05:11 UTC
More testing:

When I tried to import broken pool on another machine, it probably broke the entire ZFS subsystem.

"sudo zpool import -f -t -R /mnt/tmp 11160600761791623260 broken" command never ends, zpool process in "tx->tx" state, forever. "state" from ps command is "D+".

any other ZFS command like:
zpool lists 
zfs list

never ends. All command processes are in "D+" state forever.
I can't kill -9 them. Nothing happnes.
System is in unusable broken state now.
Comment 4 Petr Fischer 2018-10-11 16:25:04 UTC
Readonly testing... command:

zpool import -f -t -R /mnt/tmp -o readonly=on 11160600761791623260 broken

OK, imported. Some outputs:

$ zpool list
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
broken      3.97G  3.89G  82.6M        -         -     0%    97%  1.00x  ONLINE  /mnt/tmp

$ zfs list -r broken
NAME                           USED  AVAIL  REFER  MOUNTPOINT
broken                        3.85G      0    88K  /mnt/tmp/zroot
broken/ROOT                   1.75G      0    88K  none
broken/ROOT/default           1.75G      0  1.17G  /mnt/tmp
broken/tmp                    7.14M      0    96K  /mnt/tmp/tmp
broken/usr                    1.95G      0    88K  /mnt/tmp/usr
broken/usr/home               5.84M      0  5.12M  /mnt/tmp/usr/home
broken/usr/jails              1.94G      0   112K  /mnt/tmp/usr/jails
broken/usr/jails/basejail      317M      0   296M  /mnt/tmp/usr/jails/basejail
broken/usr/jails/nagios       1.23G      0   363M  /mnt/tmp/usr/jails/nagios
broken/usr/jails/newjail      4.70M      0  4.67M  /mnt/tmp/usr/jails/newjail
broken/usr/jails/nginx_proxy   405M      0  92.7M  /mnt/tmp/usr/jails/nginx_proxy
broken/usr/ports                88K      0    88K  /mnt/tmp/usr/ports
broken/usr/src                  88K      0    88K  /mnt/tmp/usr/src
broken/var                    46.7M      0    88K  /mnt/tmp/var
broken/var/audit                88K      0    88K  /mnt/tmp/var/audit
broken/var/crash                88K      0    88K  /mnt/tmp/var/crash
broken/var/log                45.3M      0  6.14M  /mnt/tmp/var/log
broken/var/mail               1004K      0   100K  /mnt/tmp/var/mail
broken/var/tmp                 104K      0    88K  /mnt/tmp/var/tmp
Comment 5 Matt Ahrens 2018-10-11 20:48:21 UTC
From the output supplied, it looks like everything is working as designed: The pool was allowed to get to ~97% full, at which point we reported 0 space available to filesystems (seen in "zfs list").  The remaining 82MB of unallocated space should have been enough to allow you to import the pool writeable and delete some files or datasets.  We need to figure out what ZFS was doing during the import that caused it to hang.