Bug 144824 - [boot] [patch] boot problem on USB (root partition mounting)
Summary: [boot] [patch] boot problem on USB (root partition mounting)
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Alexander Motin
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2010-03-17 16:50 UTC by Gilles Blanc
Modified: 2022-10-17 12:34 UTC (History)
2 users (show)

See Also:


Attachments
file.diff (1013 bytes, patch)
2010-03-17 16:50 UTC, Gilles Blanc
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gilles Blanc 2010-03-17 16:50:01 UTC
The current system on boot (file /sys/kern/vfs_mount.c) uses a queue to wait for devices to be initialized before mounting root (or try to do so). This queue is filled for instance by usb driver (using "root_mount_hold" function), so if we boot on a USB key, the function "root_mount_prepare" holds the root mount time until USB is available (that is to say the queue has be emptied by using "root_mount_rel" on all the identifiers filled by the usb driver).

Actually, it only waits for USB to be "physically" available, but not necessarily umass or scsi (scsi-da). To be more precise, the system is not deterministic, because to be mounted, a root partition on a USB key needs both umass then scsi to be initialized, and if most of the time the mount process works, it is because the 'root_holds' list is not empty, and threads are running concurrently (for example one have wired a usb key on usb0, the system sequentially initializes usb0 to usb7, and during that time, umass0 and da0 are initialized too).

Unfortunately, some servers are not that kind, and root mounting just fails ('vfs_mountroot' function asks to 'vfs_mountroot_try' to mount USB root partition, which is not yet available), so we are in a situation where the "ROOT MOUNT ERROR" prompt appears, to mount our partition by hand, which is not very acceptable on production servers (we would have to go some kilometers just to type "ufs:/dev/da0s1a" each time we reboot...).

The problem is not blocking for most of FreeBSD users, but it prevents us to migrate our systems (which is quite a big problem).

Fix: I have tried to add locks in umass and scsi drivers. In umass driver, it is in the /sys/dev/usb/storage/umass.c file, in function 'umass_attach' (in our supermicro server, umass has enough time to initialize, but I have been rigorous). In scsi driver, it is in the /sys/cam/scsi/scsi_da.c file, in function 'dastart', part "DA_STATE_PROBE2" of the switch/case. Unfortunately, between this two pairs of locking/unlocking, the root mounting thread preempts and as the list is empty during this very short time, it tries to mount root partition and fails as usual. It is not possible to add a lock in umass and remove it in scsi, because of the API which works with pointers on the lock list at the removal.

So another solution has to be considered, that is what I propose with this patch. Simply, in the vfs_mountroot_try, I try several times, with a little pause between, to call the 'kernel_mount' function. The number of trials is 3 by default, but can be customized through the new "vfs.root.mounttrymax" option in /boot/loader.conf (even set to 0, if we want to go back to the initial behavior). Each time the mount process fails and we can retry, a message appears, the thread sleeps for one second, and then try again. If it is really impossible to mount root, then we continue in the normal process of prompt.

Actually, there is still some problems on some USB ports (the other ones on the same machine work great at the first or second mounting retrial). I suspect a deeper problem in 'kernel_mount', because using the prompt doesn't mount the device, or worse can lead to page fault or locking. But my patch is enough to resolve the original problem as far as it is possible in the state of things.

I hope it will be reviewed and accepted as soon as possible.

Patch attached with submission follows:
How-To-Repeat: If you have a machine presenting this problem, you can repeat it easily (it fails 95% of the time) ; if not (like in my development laptop), you will never succeed to fail.
Comment 1 Eir Nym 2010-03-31 09:29:54 UTC
I has same issue usb/145184. I've tryed your patch, but it doesn't  
work :(

--
  With pleasure
Comment 2 Daniel Hartmeier 2010-07-16 11:56:05 UTC
You have to move the ma initialization inside the retry loop,
because kernel_mount() frees it, otherwise I get a kernel panic.

With that changed, the patch solves the issue with an Intel
S5000PAL board booting from USB, where da0 attaches slightly
too late. Possibly related to the RMM2 (remote management
module), which attaches multiple (virtual) CD-ROM drives to
USB, which produce CAM/SCSI status errors.

Daniel

--- vfs_mount.c 30 Jan 2010 12:11:21 -0000      1.312.2.3
+++ vfs_mount.c 16 Jul 2010 10:38:46 -0000
@@ -1798,6 +1798,8 @@
        int             error;
        char            patt[32];
        char            errmsg[255];
+       char            nbtry;
+       int             rootmounttrymax;

        vfsname = NULL;
        path    = NULL;
@@ -1805,6 +1807,8 @@
        ma      = NULL;
        error   = EINVAL;
        bzero(errmsg, sizeof(errmsg));
+       nbtry   = 0;
+       rootmounttrymax = 3;

        if (mountfrom == NULL)
                return (error);         /* don't complain */
@@ -1821,13 +1825,23 @@
        if (path[0] == '\0')
                strcpy(path, ROOTNAME);

-       ma = mount_arg(ma, "fstype", vfsname, -1);
-       ma = mount_arg(ma, "fspath", "/", -1);
-       ma = mount_arg(ma, "from", path, -1);
-       ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
-       ma = mount_arg(ma, "ro", NULL, 0);
-       ma = parse_mountroot_options(ma, options);
-       error = kernel_mount(ma, MNT_ROOTFS);
+       while (1) {
+               ma = NULL;
+               ma = mount_arg(ma, "fstype", vfsname, -1);
+               ma = mount_arg(ma, "fspath", "/", -1);
+               ma = mount_arg(ma, "from", path, -1);
+               ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
+               ma = mount_arg(ma, "ro", NULL, 0);
+               ma = parse_mountroot_options(ma, options);
+               error = kernel_mount(ma, MNT_ROOTFS);
+               if (nbtry < rootmounttrymax && error != 0) {
+                       printf("Mount failed, retrying mount root from %s\n",
+                           mountfrom);
+                       tsleep(&rootmounttrymax, PZERO | PDROP, "mount", hz);
+                       nbtry++;
+               } else
+                       break;
+       }

        if (error == 0) {
                /*
Comment 3 Rechistov Grigory 2010-08-03 08:56:16 UTC
Experienced this issue on FreeBSD 8.1-RELEASE i386, see this bug

http://www.freebsd.org/cgi/query-pr.cgi?pr=usb/143790

for additional details.
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:57 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 5 Alexander Motin freebsd_committer freebsd_triage 2019-11-22 19:03:52 UTC
This commit should fix the issue on head: https://svnweb.freebsd.org/changeset/base/355010 .
Comment 6 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:34:43 UTC
Keyword: 

    patch
or  patch-ready

– in lieu of summary line prefix: 

    [patch]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>