Bug 230910 - zfs fails to mount root with error 22 when geom_mirror is loaded (possible regression)
Summary: zfs fails to mount root with error 22 when geom_mirror is loaded (possible re...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-geom (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-26 13:13 UTC by Volodymyr Kostyrko
Modified: 2018-08-27 20:17 UTC (History)
0 users

See Also:


Attachments
screen with error (325.81 KB, image/png)
2018-08-26 13:13 UTC, Volodymyr Kostyrko
no flags Details
screen1 (163.98 KB, image/png)
2018-08-27 20:04 UTC, Volodymyr Kostyrko
no flags Details
screen2 (151.34 KB, image/png)
2018-08-27 20:04 UTC, Volodymyr Kostyrko
no flags Details
screen3 (159.14 KB, image/png)
2018-08-27 20:07 UTC, Volodymyr Kostyrko
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volodymyr Kostyrko 2018-08-26 13:13:13 UTC
Created attachment 196562 [details]
screen with error

Mail from 2018/05/23:

Just pushed updates to one of dev servers and found out it's unable to boot right now. During boot situation resembles what was shown previously at:

https://lists.freebsd.org/pipermail/freebsd-fs/2013-September/018246.html

1. Gmirror grabs partitions for two devices (swap/boot)
2. ZFS fails to mount root with error 22.
3. random: unblocking device
4. ZFS retries mount and fails again, exits to debug prompt.

Setting vfs.mountroot.timeout to 300 just makes kernel output another error each ~12 seconds. There is a problem with a code, it decreases value by one second for each full timeout period.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882

^^^ Here's where I got those tunables.

My kernel configuration is rather minimal and is very close to MINIMAL. Removing geom_mirror fixes the problem. Starting geom_mirror later works without any issues.

Machine has UFS boot due to rather nasty Cisco firmware that only allows one single device to be available during boot.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2018-08-27 06:59:42 UTC
Can you try https://lists.freebsd.org/pipermail/freebsd-fs/2013-September/018251.html from the thread you mentioned?
Comment 2 Volodymyr Kostyrko 2018-08-27 20:04:29 UTC
Created attachment 196612 [details]
screen1
Comment 3 Volodymyr Kostyrko 2018-08-27 20:04:52 UTC
Created attachment 196613 [details]
screen2
Comment 4 Volodymyr Kostyrko 2018-08-27 20:07:04 UTC
Created attachment 196614 [details]
screen3
Comment 5 Volodymyr Kostyrko 2018-08-27 20:17:59 UTC
Looks like I partially know whats going on here. First when I tried to partition drives they were just freebsd-boot, freebsd-swap and freebsd-zfs. But when that failed to work I sliced out 1G out of last one into other partition:

# gpart show da0
=>       40  781422688  da0  GPT  (373G)
         40        256    1  freebsd-boot  (128K)
        296   67108864    2  freebsd-swap  (32G)
   67109160    2097152    3  freebsd-ufs  (1.0G)
   69206312  712216408    4  freebsd-zfs  (340G)
  781422720          8       - free -  (4.0K)

arcade@cis\/home/arcade# zpool status mycon
  pool: mycon
 state: ONLINE
  scan: scrub repaired 0 in 0h33m with 0 errors on Mon Aug 13 04:45:19 2018
config:

        NAME        STATE     READ WRITE CKSUM
        mycon       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            da0p4   ONLINE       0     0     0
            da1p4   ONLINE       0     0     0
            da2p4   ONLINE       0     0     0

errors: No known data errors

Here what it shows during "good" boot:

Reconstructed root pool config:
version: 5000
name: mycon
state: 0
txg: 19961181
pool_guid: 14318237666280658641
hostid: 2404943215
hostname: cis.mix101.info
vdev_children: 1
features_for_read:
        com.delphix:hole_birth: false
        com.delphix:embedded_data: false

vdev_tree:
        type: root
        id: 0
        guid: 14318237666280658641
        children:
        children[0]:
                type: raidz
                id: 0
                guid: 17901688662798610243
                nparity: 1
                metaslab_array: 39
                metaslab_shift: 33
                ashift: 12
                asize: 1093949718528
                is_log: 0
                create_txg: 4
                children:
                children[0]:
                        type: disk
                        id: 0
                        guid: 3986367950682223964
                        path: /dev/da0p4
                        whole_disk: 1
                        DTL: 177
                        create_txg: 4
                children[1]:
                        type: disk
                        id: 1
                        guid: 9245294472961478945
                        path: /dev/da1p4
                        whole_disk: 1
                        DTL: 176
                        create_txg: 4
                children[2]:
                        type: disk
                        id: 2
                        guid: 5056913690226738568
                        path: /dev/da2p4
                        whole_disk: 1
                        DTL: 175
                        create_txg: 4


This means that ZFS finds correct pool marks on partition already grabbed by geom_mirror and fails when trying to construct pool because kernel would not allow rw access there.

I'll go zero out my ufs partitions...