Bug 172952

Summary: 9.1 hangs on reboot after all buffers synced
Product: Base System Reporter: Travis Mikalson <bofh>
Component: kernAssignee: Andriy Gapon <avg>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Travis Mikalson 2012-10-22 11:20:00 UTC
This is similar to PR 172086, but circumstances are different enough to warrant a new PR. Somewhere between 9.0-RELEASE and 9.1-RC, the ability to reliably soft reboot has been seriously compromised. When trying to "reboot" or "shutdown -r now", FreeBSD hangs indefinitely after "All buffers synced." The system is still pingable but otherwise non-responsive and requires a power-cycle. This problem is confirmed to not occur at all using 9.0-RELEASE-p4 world+kernel built from base/releng/9.0 48 hours ago.

On two different servers with Supermicro H8SGL-F motherboards (AMD SR5650 + SP5100 Chipset, Opteron 6212 processor) and one or more mps(4) LSI SAS2008 controllers with ZFS loaded, this problem is 100% reproducible every time I attempt to reboot when using both 9.1-RC2 and base/stable/9 from 48 hours ago. These two servers cannot soft reboot, ever. I have also tried updating the BIOS on one of them and changing various BIOS settings to no avail including switching SATA controllers between AHCI and IDE emulation mode.

On a simpler Intel Core i5 760 system with two SATA drives (using AHCI) this problem occurs about 25% of the time that I attempt to reboot. I have reproduced this hang randomly once so far on the simpler Intel Core i5 760 system without ZFS loaded with only a single SATA SSD using plain UFS plus journaled softupdates.

I also built 9.1-RC2 with all changes from SVN revision 237873 patched out of it and I've confirmed that commit was not the cause of this problem.

I realize this isn't yet adequate information to find and fix, but seeing as this is happening to varying degrees on all three systems I've put 9.1-RC2 on, I feel this issue is a severe showstopper for 9.1-RELEASE and I wanted to get this PR in before the 9.1 release cycle was completed. I will submit follow-up if I narrow this down in any useful way.

Screen shot of issue: http://tog.net/reboothang-freebsd91.jpg

How-To-Repeat: I have not yet narrowed down what specifically causes this issue to be 100% reproducible or not so without someone else having my hardware config that 100% reproduces the problem here, is difficult to tell someone else how to repeat the problem. I will follow up as soon as I determine something more useful.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2012-10-22 20:49:36 UTC
on 22/10/2012 13:18 Travis Mikalson said the following:
> Screen shot of issue: http://tog.net/reboothang-freebsd91.jpg

Is there any additional useful information if you set sysctl debug.bootverbose=1
before reboot?

-- 
Andriy Gapon
Comment 2 Travis Mikalson 2012-10-23 09:05:17 UTC
Andriy Gapon wrote:
> on 22/10/2012 13:18 Travis Mikalson said the following:
>> Screen shot of issue: http://tog.net/reboothang-freebsd91.jpg
> 
> Is there any additional useful information if you set sysctl debug.bootverbose=1
> before reboot?

I'm afraid not. There is actually zero additional verbosity in the
shutdown kernel messages after setting sysctl debug.bootverbose=1.

I increased all other forms of additional verbosity and absolutely
nothing additional is shown during shutdown, sorry.
Comment 3 Per olof Ljungmark 2012-10-23 19:48:00 UTC
Identical behavior found on the following hardware, all HP:

DL360 G4
DL360 G5
XW6400
XW6600

All with 9-STABLE and ZFS-only filesystems and only after rebuilding
system from source. For example, we have two 360 G4 installed 9th. of
October, both were able to reboot after install. Rebuilt one two days
ago and now it cannot reboot properly except when you issue a "shutdown
-n -o -r now".

Tried
hw.usb.no_shutdown_wait=1
and
hw.acpi.handle_reboot=1
no change.

I think "Severity: non-critical" should be changed to critical because
this could potentially get you into big trouble when you try to reboot a
remote machine.
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2012-10-23 20:43:27 UTC
on 23/10/2012 11:05 Travis Mikalson said the following:
> Andriy Gapon wrote:
>> on 22/10/2012 13:18 Travis Mikalson said the following:
>>> Screen shot of issue: http://tog.net/reboothang-freebsd91.jpg
>>
>> Is there any additional useful information if you set sysctl debug.bootverbose=1
>> before reboot?
> 
> I'm afraid not. There is actually zero additional verbosity in the
> shutdown kernel messages after setting sysctl debug.bootverbose=1.
> 
> I increased all other forms of additional verbosity and absolutely
> nothing additional is shown during shutdown, sorry.
> 

Do you have DDB in your kernel?  Are you able to break to it using the keyboard
combination?
If yes, could you please execute 'ps' command in DDB and somehow capture its output?

Also,
http://people.freebsd.org/~jhb/papers/bsdcan/2008/article/node3.html
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html
-- 
Andriy Gapon
Comment 5 patfbsd 2012-10-23 21:27:52 UTC
Hello,

I see this on my server (9.1RC2/amd64). As it is a remote server I
can't say what happens at shutdown (but the behavior is that the box
still ping and never reboots).

There was a discussion on this issue on the FreeBSD forums but without
solutions. Looks like fixed on current (? but by what ?)
http://forums.freebsd.org/showthread.php?p=191902&highlight=hang#post191902

My box uses one zspool for the jails, one for poudriere and many
nullfs mount points (/usr/ports into jails and so on).

dmesg: http://user.lamaiziere.net/patrick/9-1hang-dmesg.txt

Regards.
Comment 6 Travis Mikalson 2012-10-24 05:37:29 UTC
Andriy, thanks for the followup. I wanted to quickly add that r240822
and r241022 from current did NOT fix this issue as some had hoped it
would. I patched that change in to 9.1-RC2 and tested.

Commit log:
http://svnweb.freebsd.org/base/head/sys/geom/geom_disk.c?view=log

Patch:
http://svnweb.freebsd.org/base/head/sys/geom/geom_disk.c?r1=240629&r2=241022&view=patch

Again, this did NOT fix the issue, as promising as it looked. I will get
DDB going on this system here and follow up again.
Comment 7 Per olof Ljungmark 2012-10-24 16:59:33 UTC
Box is in ddb now so I can try to supply whatever info required


db> bt
Tracing pid 12 tid 100014 td 0xfffffe0002f9b490
kdb_enter() at kdb_enter+0x3b
kdb_break() at kdb_break+0x27
scgetc() at scgetc+0x361
sckbdevent() at sckbdevent+0xed
kbdmux_intr() at kbdmux_intr+0x43
kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20
taskqueue_run_locked() at taskqueue_run_locked+0x74
taskqueue_run() at taskqueue_run+0x3a
intr_event_execute_handlers() at intr_event_execute_handlers+0xfd
ithread_loop() at ithread_loop+0x9e
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8000269cf0, rbp = 0 ---

db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
   22     0     0     0  DL      sdflush  0xffffffff812b76f8 [softdepflush]
   21     0     0     0  DL      kpsusp   0xfffffe00092fc5c0 [syncer]
   20     0     0     0  DL      kpsusp   0xfffffe00092fca60 [vnlru]
   19     0     0     0  DL      kpsusp   0xfffffe00092fd120 [bufdaemon]
   18     0     0     0  DL      pgzero   0xffffffff812c171c [pagezero]
    9     0     0     0  DL      psleep   0xffffffff812c08e8 [vmdaemon]
    8     0     0     0  DL      psleep   0xffffffff812c08ac [pagedaemon]
    7     0     0     0  DL      ccb_scan 0xffffffff8121b2e0 [xpt_thrd]
    6     0     0     0  DL      waiting_ 0xffffffff812b1a00 [sctp_iterator]
    5     0     0     0  DL      (threaded)                  [zfskern]
100352                   D       zvol:io  0xfffffe00097485f8 [zvol
mailrouter/sw
a]
100344                   D       zvol:io  0xfffffe00092c4df8 [zvol
mailrouter/sw
a]
100343                   D       tx->tx_s 0xfffffe000939e210
[txg_thread_enter]
100342                   D       tx->tx_q 0xfffffe000939e230
[txg_thread_enter]
100056                   D       l2arc_fe 0xffffffff81696bc0
[l2arc_feed_thread]
100055                   D       arc_recl 0xffffffff81686d20
[arc_reclaim_thread
]
    4     0     0     0  DL      -        0xfffffe0008841a48 [fdc0]
   17     0     0     0  DL      cooling  0xfffffe000835d558 [acpi_cooling0]
   16     0     0     0  DL      tzpoll   0xffffffff812201d0 [acpi_thermal]
    4     0     0     0  DL      -        0xfffffe0008841a48 [fdc0]
   17     0     0     0  DL      cooling  0xfffffe000835d558 [acpi_cooling0]
   16     0     0     0  DL      tzpoll   0xffffffff812201d0 [acpi_thermal]
   15     0     0     0  DL      (threaded)                  [usb]
100043                   D       -        0xffffff8001ad0e18 [usbus2]
100042                   D       -        0xffffff8001ad0dc0 [usbus2]
100041                   D       -        0xffff   13     0     0     0
 DL      (threaded)                  [geom]
100011                   D       -        0xffffffff81259070 [g_down]
100010                   D       -        0xffffffff81259068 [g_up]
100009                   D       -        0xffffffff81259058 [g_event]
   12     0     0     0  RL      (threaded)                  [intr]
100050                   I                                   [swi0: uart
uart]
100049                   I                                   [irq12: psm0]
100048                   I                                   [irq1: atkbd0]
100045                   I                                   [irq15: ata1]
100044                   I                                   [irq14: ata0]
100039                   I                                   [irq23: ehci0]
100034                   I                                   [irq19: uhci1]
100029                   I                                   [irq16: uhci0]
100028                   I                                   [irq26: bge1]
100027                   I                                   [irq25: bge0]
100025                   I                                   [irq24: ciss0]
100022                   I                                   [swi5: fast
taskq]
100018                   I                                   [swi2: cambio]
100016                   I                                   [swi6: task
queue]
100014                   Run     CPU 0                       [swi6:
Giant taskq]
100008                   I                                   [swi4: clock]
100007                   I                                   [swi4: clock]
100006                   I                                   [swi3: vm]
100005                   I                                   [swi1:
netisr 0]
ff8001ad0d68 [usbus2]
100040                   D       -        0xffffff8001ad0d10 [usbus2]
100038                   D       -        0xffffff8001ac7ef0 [usbus1]
100037                   D       -        0xffffff8001ac7e98 [usbus1]
100036                   D       -        0xffffff8001ac7e40 [usbus1]
100035                   D       -        0xffffff8001ac7de8 [usbus1]
100033                   D       -        0xffffff8001abeef0 [usbus0]
100032                   D       -        0xffffff8001abee98 [usbus0]
100031                   D       -        0xffffff8001abee40 [usbus0]
100030                   D       -        0xffffff8001abede8 [usbus0]
    3     0     0     0  DL      idle     0xffffff8001aa70f0 [ciss_notify0]
    2     0     0     0  DL      ctl_work 0xffffff800076a000 [ctl_thrd]
   14     0     0     0  DL      -        0xffffffff8125fca4 [yarrow]
   13     0     0     0  DL      (threaded)                  [geom]
100011                   D       -        0xffffffff81259070 [g_down]
100010                   D       -        0xffffffff81259068 [g_up]
100009                   D       -        0xffffffff81259058 [g_event]
100008                   I                                   [swi4: clock]
100007                   I                                   [swi4: clock]
100006                   I                                   [swi3: vm]
100005                   I                                   [swi1:
netisr 0]
   11     0     0     0  RL      (threaded)                  [idle]
100004                   Run     CPU 1                       [idle: cpu1]
100003                   CanRun                              [idle: cpu0]
    1     0     1     0  DLs     zcollide 0xfffffe00aa18bc80 [init]
   10     0     0     0  DL      audit_wo 0xffffffff812b6670 [audit]
    0     0     0     0  DLs     (threaded)                  [kernel]
100390                   D       -        0xfffffe000998ba00 [zil_clean]
100389                   D       -        0xfffffe000998c280 [zil_clean]
100388                   D       -        0xfffffe000998c800 [zil_clean]
100387                   D       -        0xfffffe0009813980 [zil_clean]
100386                   D       -        0xfffffe000967cb80 [zil_clean]
100385                   D       -        0xfffffe0009813580 [zil_clean]
100384                   D       -        0xfffffe00097fae00 [zil_clean]
100383                   D       -        0xfffffe00097faa00 [zil_clean]
100380                   D       -        0xfffffe0009812a80 [zil_clean]
100354                   D       -        0xfffffe00097af100 [zil_clean]
100341                   D       -        0xfffffe0009455000
[zfs_vn_rele_taskq]
100340                   D       -        0xfffffe0009455200
[zio_ioctl_intr]
100339                   D       -        0xfffffe0009455280
[zio_ioctl_issue]
100338                   D       -        0xfffffe0009455300
[zio_claim_intr]
100341                   D       -        0xfffffe0009455000
[zfs_vn_rele_taskq]
100340                   D       -        0xfffffe0009455200
[zio_ioctl_intr]
100339                   D       -        0xfffffe0009455280
[zio_ioctl_issue]
100338                   D       -        0xfffffe0009455300
[zio_claim_intr]
100337                   D       -        0xfffffe0009455380
[zio_claim_issue]
100336                   D       -        0xfffffe0009455400 [zio_free_intr]
100335                   D       -        0xfffffe0009455480
[zio_free_issue_99]
100334                   D       -        0xfffffe0009455480
[zio_free_issue_98]
100333                   D       -        0xfffffe0009455480
[zio_free_issue_97]
100332                   D       -        0xfffffe0009455480
[zio_free_issue_96]
100331                   D       -        0xfffffe0009455480
[zio_free_issue_95]
100330                   D       -        0xfffffe0009455480
[zio_free_issue_94]
100329                   D       -        0xfffffe0009455480
[zio_free_issue_93]
100328                   D       -        0xfffffe0009455480
[zio_free_issue_92]
100327                   D       -        0xfffffe0009455480
[zio_free_issue_91]
100326                   D       -        0xfffffe0009455480
[zio_free_issue_90]
100325                   D       -        0xfffffe0009455480
[zio_free_issue_89]
100324                   D       -        0xfffffe0009455480
[zio_free_issue_88]
100323                   D       -        0xfffffe0009455480
[zio_free_issue_87]
100322                   D       -        0xfffffe0009455480
[zio_free_issue_86]
100321                   D       -        0xfffffe0009455480
[zio_free_issue_85]
100320                   D       -        0xfffffe0009455480
[zio_free_issue_84]
100319                   D       -        0xfffffe0009455480
[zio_free_issue_83]
100318                   D       -        0xfffffe0009455480
[zio_free_issue_82]
100321                   D       -        0xfffffe0009455480
[zio_free_issue_85]
100320                   D       -        0xfffffe0009455480
[zio_free_issue_84]
100319                   D       -        0xfffffe0009455480
[zio_free_issue_83]
100318                   D       -        0xfffffe0009455480
[zio_free_issue_82]
100317                   D       -        0xfffffe0009455480
[zio_free_issue_81]
100316                   D       -        0xfffffe0009455480
[zio_free_issue_80]
100315                   D       -        0xfffffe0009455480
[zio_free_issue_79]
100314                   D       -        0xfffffe0009455480
[zio_free_issue_78]
100313                   D       -        0xfffffe0009455480
[zio_free_issue_77]
100312                   D       -        0xfffffe0009455480
[zio_free_issue_76]
100311                   D       -        0xfffffe0009455480
[zio_free_issue_75]
100310                   D       -        0xfffffe0009455480
[zio_free_issue_74]
100309                   D       -        0xfffffe0009455480
[zio_free_issue_73]
100308                   D       -        0xfffffe0009455480
[zio_free_issue_72]
100307                   D       -        0xfffffe0009455480
[zio_free_issue_71]
100306                   D       -        0xfffffe0009455480
[zio_free_issue_70]
100305                   D       -        0xfffffe0009455480
[zio_free_issue_69]
100304                   D       -        0xfffffe0009455480
[zio_free_issue_68]
100303                   D       -        0xfffffe0009455480
[zio_free_issue_67]
100302                   D       -        0xfffffe0009455480
[zio_free_issue_66]
100301                   D       -        0xfffffe0009455480
[zio_free_issue_65]
100300                   D       -        0xfffffe0009455480
[zio_free_issue_64]
100299                   D       -        0xfffffe0009455480
[zio_free_issue_63]
100298                   D       -        0xfffffe0009455480
[zio_free_issue_62]
100301                   D       -        0xfffffe0009455480
[zio_free_issue_65]
100300                   D       -        0xfffffe0009455480
[zio_free_issue_64]
100299                   D       -        0xfffffe0009455480
[zio_free_issue_63]
100298                   D       -        0xfffffe0009455480
[zio_free_issue_62]
100297                   D       -        0xfffffe0009455480
[zio_free_issue_61]
100296                   D       -        0xfffffe0009455480
[zio_free_issue_60]
100295                   D       -        0xfffffe0009455480
[zio_free_issue_59]
100294                   D       -        0xfffffe0009455480
[zio_free_issue_58]
100293                   D       -        0xfffffe0009455480
[zio_free_issue_57]
100292                   D       -        0xfffffe0009455480
[zio_free_issue_56]
100291                   D       -        0xfffffe0009455480
[zio_free_issue_55]
100290                   D       -        0xfffffe0009455480
[zio_free_issue_54]
100289                   D       -        0xfffffe0009455480
[zio_free_issue_53]
100288                   D       -        0xfffffe0009455480
[zio_free_issue_52]
100287                   D       -        0xfffffe0009455480
[zio_free_issue_51]
100286                   D       -        0xfffffe0009455480
[zio_free_issue_50]
100285                   D       -        0xfffffe0009455480
[zio_free_issue_49]
100284                   D       -        0xfffffe0009455480
[zio_free_issue_48]
100283                   D       -        0xfffffe0009455480
[zio_free_issue_47]
100282                   D       -        0xfffffe0009455480
[zio_free_issue_46]
100281                   D       -        0xfffffe0009455480
[zio_free_issue_45]
100280                   D       -        0xfffffe0009455480
[zio_free_issue_44]
100279                   D       -        0xfffffe0009455480
[zio_free_issue_43]
100278                   D       -        0xfffffe0009455480
[zio_free_issue_42]
100281                   D       -        0xfffffe0009455480
[zio_free_issue_45]
100280                   D       -        0xfffffe0009455480
[zio_free_issue_44]
100279                   D       -        0xfffffe0009455480
[zio_free_issue_43]
100278                   D       -        0xfffffe0009455480
[zio_free_issue_42]
100277                   D       -        0xfffffe0009455480
[zio_free_issue_41]
100276                   D       -        0xfffffe0009455480
[zio_free_issue_40]
100275                   D       -        0xfffffe0009455480
[zio_free_issue_39]
100274                   D       -        0xfffffe0009455480
[zio_free_issue_38]
100273                   D       -        0xfffffe0009455480
[zio_free_issue_37]
100272                   D       -        0xfffffe0009455480
[zio_free_issue_36]
100271                   D       -        0xfffffe0009455480
[zio_free_issue_35]
100270                   D       -        0xfffffe0009455480
[zio_free_issue_34]
100269                   D       -        0xfffffe0009455480
[zio_free_issue_33]
100268                   D       -        0xfffffe0009455480
[zio_free_issue_32]
100267                   D       -        0xfffffe0009455480
[zio_free_issue_31]
100266                   D       -        0xfffffe0009455480
[zio_free_issue_30]
100265                   D       -        0xfffffe0009455480
[zio_free_issue_29]
100264                   D       -        0xfffffe0009455480
[zio_free_issue_28]
100263                   D       -        0xfffffe0009455480
[zio_free_issue_27]
100262                   D       -        0xfffffe0009455480
[zio_free_issue_26]
100261                   D       -        0xfffffe0009455480
[zio_free_issue_25]
100260                   D       -        0xfffffe0009455480
[zio_free_issue_24]
100259                   D       -        0xfffffe0009455480
[zio_free_issue_23]
100258                   D       -        0xfffffe0009455480
[zio_free_issue_22]
100261                   D       -        0xfffffe0009455480
[zio_free_issue_25]
100260                   D       -        0xfffffe0009455480
[zio_free_issue_24]
100259                   D       -        0xfffffe0009455480
[zio_free_issue_23]
100258                   D       -        0xfffffe0009455480
[zio_free_issue_22]
100257                   D       -        0xfffffe0009455480
[zio_free_issue_21]
100256                   D       -        0xfffffe0009455480
[zio_free_issue_20]
100255                   D       -        0xfffffe0009455480
[zio_free_issue_19]
100254                   D       -        0xfffffe0009455480
[zio_free_issue_18]
100253                   D       -        0xfffffe0009455480
[zio_free_issue_17]
100252                   D       -        0xfffffe0009455480
[zio_free_issue_16]
100251                   D       -        0xfffffe0009455480
[zio_free_issue_15]
100250                   D       -        0xfffffe0009455480
[zio_free_issue_14]
100249                   D       -        0xfffffe0009455480
[zio_free_issue_13]
100248                   D       -        0xfffffe0009455480
[zio_free_issue_12]
100247                   D       -        0xfffffe0009455480
[zio_free_issue_11]
100246                   D       -        0xfffffe0009455480
[zio_free_issue_10]
100245                   D       -        0xfffffe0009455480
[zio_free_issue_9]
100244                   D       -        0xfffffe0009455480
[zio_free_issue_8]
100243                   D       -        0xfffffe0009455480
[zio_free_issue_7]
100242                   D       -        0xfffffe0009455480
[zio_free_issue_6]
100241                   D       -        0xfffffe0009455480
[zio_free_issue_5]
100240                   D       -        0xfffffe0009455480
[zio_free_issue_4]
100239                   D       -        0xfffffe0009455480
[zio_free_issue_3]
100238                   D       -        0xfffffe0009455480
[zio_free_issue_2]
100233                   D       -        0xfffffe0009455500
[zio_write_intr_hig
h]
100232                   D       -        0xfffffe0009455500
[zio_write_intr_hig
h]
100231                   D       -        0xfffffe0009455500
[zio_write_intr_hig
h]
100230                   D       -        0xfffffe0009455580
[zio_write_intr_7]
100229                   D       -        0xfffffe0009455580
[zio_write_intr_6]
100228                   D       -        0xfffffe0009455580
[zio_write_intr_5]
100227                   D       -        0xfffffe0009455580
[zio_write_intr_4]
100226                   D       -        0xfffffe0009455580
[zio_write_intr_3]
100225                   D       -        0xfffffe0009455580
[zio_write_intr_2]
100224                   D       -        0xfffffe0009455580
[zio_write_intr_1]
100223                   D       -        0xfffffe0009455580
[zio_write_intr_0]
100222                   D       -        0xfffffe0009455600
[zio_write_issue_hi
g]
100221                   D       -        0xfffffe0009455600
[zio_write_issue_hi
g]
100220                   D       -        0xfffffe0009455600
[zio_write_issue_hi
g]
100219                   D       -        0xfffffe0009455600
[zio_write_issue_hi
g]
100218                   D       -        0xfffffe0009455600
[zio_write_issue_hi
100219                   D       -        0xfffffe0009455600
[zio_write_issue_hi
g]
100218                   D       -        0xfffffe0009455600
[zio_write_issue_hi
g]
100217                   D       -        0xfffffe0009455680
[zio_write_issue_1]
100216                   D       -        0xfffffe0009455680
[zio_write_issue_0]
100215                   D       -        0xfffffe0009455700
[zio_read_intr_1]
100214                   D       -        0xfffffe0009455700
[zio_read_intr_0]
100213                   D       -        0xfffffe0009455780
[zio_read_issue_7]
100212                   D       -        0xfffffe0009455780
[zio_read_issue_6]
100211                   D       -        0xfffffe0009455780
[zio_read_issue_5]
100210                   D       -        0xfffffe0009455780
[zio_read_issue_4]
100209                   D       -        0xfffffe0009455780
[zio_read_issue_3]
100208                   D       -        0xfffffe0009455780
[zio_read_issue_2]
100207                   D       -        0xfffffe0009455780
[zio_read_issue_1]
100206                   D       -        0xfffffe0009455780
[zio_read_issue_0]
100205                   D       -        0xfffffe0009455800 [zio_null_intr]
100204                   D       -        0xfffffe0009455880
[zio_null_issue]
100054                   D       -        0xfffffe0009216c00 [mca taskq]
100053                   D       -        0xfffffe0009216d00
[system_taskq_1]
100052                   D       -        0xfffffe0009216d00
[system_taskq_0]
100023                   D       -        0xfffffe000414c000 [thread taskq]
100021                   D       -        0xfffffe000414c180 [acpi_task_2]
100020                   D       -        0xfffffe000414c180 [acpi_task_1]
00216                   D       -        0xfffffe0009455680
[zio_write_issue_0]
100215                   D       -        0xfffffe0009455700
[zio_read_intr_1]
100214                   D       -        0xfffffe0009455700
[zio_read_intr_0]
100213                   D       -        0xfffffe0009455780
[zio_read_issue_7]
100212                   D       -        0xfffffe0009455780
[zio_read_issue_6]
100211                   D       -        0xfffffe0009455780
[zio_read_issue_5]
100210                   D       -        0xfffffe0009455780
[zio_read_issue_4]
100209                   D       -        0xfffffe0009455780
[zio_read_issue_3]
100208                   D       -        0xfffffe0009455780
[zio_read_issue_2]
100207                   D       -        0xfffffe0009455780
[zio_read_issue_1]
100206                   D       -        0xfffffe0009455780
[zio_read_issue_0]
100205                   D       -        0xfffffe0009455800 [zio_null_intr]
100204                   D       -        0xfffffe0009455880
[zio_null_issue]
100054                   D       -        0xfffffe0009216c00 [mca taskq]
100053                   D       -        0xfffffe0009216d00
[system_taskq_1]
100052                   D       -        0xfffffe0009216d00
[system_taskq_0]
100023                   D       -        0xfffffe000414c000 [thread taskq]
100021                   D       -        0xfffffe000414c180 [acpi_task_2]
100020                   D       -        0xfffffe000414c180 [acpi_task_1]
100019                   D       -        0xfffffe000414c180 [acpi_task_0]
100017                   D       -        0xfffffe000414c480 [kqueue taskq]
100015                   D       -        0xfffffe00040fdd00 [ffs_trim
taskq]
100012                   D       -        0xfffffe0002f9dd80 [firmware
taskq]
100000                   D       sched    0xffffffff81259340 [swapper]
Comment 8 Andriy Gapon freebsd_committer freebsd_triage 2012-10-24 17:20:54 UTC
> 1 0 1 0 DLs zcollide 0xfffffe00aa18bc80 [init]

This looks suspicious.

Could you please do the following in ddb?

show proc 1
thread <whatever is reported as thread id above>
bt

-- 
Andriy Gapon
Comment 9 Per olof Ljungmark 2012-10-24 18:46:52 UTC
Process 1 (init) at 0xfffffe0002f7d940:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0xffffffff81259340
 ABI: FreeBSD ELF64
 arguments: /sbin/init
 threads: 1
100002                   D       zcollide 0xfffffe00aa18bc80 [init]

db> thread 100002
[ thread pid 1 tid 100002 ]
sched_switch+0x119:     movl    %gs:0x34,%ebx

db> bt
Tracing pid 1 tid 100002 td 0xfffffe0002f80920
sched_switch() at sched_switch+0x119
mi_switch() at mi_switch+0x186
sleepq_timedwait() at sleepq_timedwait+0x42
_sleep() at _sleep+0x1c9
zfs_zget() at zfs_zget+0x1f5
zfs_get_data() at zfs_get_data+0x4a
zil_commit() at zil_commit+0x541
zfs_freebsd_write() at zfs_freebsd_write+0xba0
VOP_WRITE_APV() at VOP_WRITE_APV+0xb2
vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x1bb
vnode_pager_putpages() at vnode_pager_putpages+0xa9
vm_pageout_flush() at vm_pageout_flush+0xc0
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x143
vm_object_page_clean() at vm_object_page_clean+0x143
vm_object_terminate() at vm_object_terminate+0x215
vnode_destroy_vobject() at vnode_destroy_vobject+0xb9
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x57
vgonel() at vgonel+0x127
vflush() at vflush+0x2bf
Comment 10 Andriy Gapon freebsd_committer freebsd_triage 2012-10-24 18:53:37 UTC
on 24/10/2012 20:46 Per olof Ljungmark said the following:
> Tracing pid 1 tid 100002 td 0xfffffe0002f80920
> sched_switch() at sched_switch+0x119
> mi_switch() at mi_switch+0x186
> sleepq_timedwait() at sleepq_timedwait+0x42
> _sleep() at _sleep+0x1c9
> zfs_zget() at zfs_zget+0x1f5
> zfs_get_data() at zfs_get_data+0x4a
> zil_commit() at zil_commit+0x541
> zfs_freebsd_write() at zfs_freebsd_write+0xba0
> VOP_WRITE_APV() at VOP_WRITE_APV+0xb2
> vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x1bb
> vnode_pager_putpages() at vnode_pager_putpages+0xa9
> vm_pageout_flush() at vm_pageout_flush+0xc0
> vm_object_page_collect_flush() at vm_object_page_collect_flush+0x143
> vm_object_page_clean() at vm_object_page_clean+0x143
> vm_object_terminate() at vm_object_terminate+0x215
> vnode_destroy_vobject() at vnode_destroy_vobject+0xb9
> zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x57
> vgonel() at vgonel+0x127
> vflush() at vflush+0x2bf
> 

OK, in your case this seems to be an instance of an already known problem:
http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/15847/focus=16056
There is an ongoing work to fix it.
I can point you to _my_ WIP patch or/and github branch, whichever is more
convenient for you.  There is also the Justin's patch.

P.S. Others who reported the same problem: the same symptoms do not imply the same
cause, so you need to repeat the debugging steps in your environment.

-- 
Andriy Gapon
Comment 11 Per olof Ljungmark 2012-10-24 18:57:19 UTC
Sorry, backtrace not complete.

db> bt
Tracing pid 1 tid 100002 td 0xfffffe0002f80920
sched_switch() at sched_switch+0x119
mi_switch() at mi_switch+0x186
sleepq_timedwait() at sleepq_timedwait+0x42
_sleep() at _sleep+0x1c9
zfs_zget() at zfs_zget+0x1f5
zfs_get_data() at zfs_get_data+0x4a
zil_commit() at zil_commit+0x541
zfs_freebsd_write() at zfs_freebsd_write+0xba0
VOP_WRITE_APV() at VOP_WRITE_APV+0xb2
vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x1bb
vnode_pager_putpages() at vnode_pager_putpages+0xa9
vm_pageout_flush() at vm_pageout_flush+0xc0
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x143
vm_object_page_clean() at vm_object_page_clean+0x143
vm_object_terminate() at vm_object_terminate+0x215
vnode_destroy_vobject() at vnode_destroy_vobject+0xb9
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x57
vgonel() at vgonel+0x127
vflush() at vflush+0x2bf
zfs_zget() at zfs_zget+0x1f5
zfs_get_data() at zfs_get_data+0x4a
zil_commit() at zil_commit+0x541
zfs_freebsd_write() at zfs_freebsd_write+0xba0
VOP_WRITE_APV() at VOP_WRITE_APV+0xb2
vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x1bb
vnode_pager_putpages() at vnode_pager_putpages+0xa9
vm_pageout_flush() at vm_pageout_flush+0xc0
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x143
vm_object_page_clean() at vm_object_page_clean+0x143
vm_object_terminate() at vm_object_terminate+0x215
vnode_destroy_vobject() at vnode_destroy_vobject+0xb9
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x57
vgonel() at vgonel+0x127
vflush() at vflush+0x2bf
zfs_umount() at zfs_umount+0x9f
dounmount() at dounmount+0x295
vfs_unmountall() at vfs_unmountall+0x42
kern_reboot() at kern_reboot+0x806
sys_reboot() at sys_reboot+0x6c
amd64_syscall() at amd64_syscall+0x540
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40e3dc, rsp =
0x7fffffffd6d
8, rbp = 0x33d63d75 ---
Comment 12 Travis Mikalson 2012-10-29 19:17:03 UTC
After putting this pre-production system into its final configuration,
(installing the base system to a compact flash) I have now been unable
to reproduce this issue else I would have followed up earlier. 9.1-RC2
has soft rebooted successfully for me on this system without fail many
times for days. This system is a ZFS storage server, but despite how
much or how little I use ZFS, it is rebooting fine now.

Andriy Gapon, I do still have one less convenient remote system left
that consistently reproduces this issue. Can I get your WIP patch to try
on that system to see if that system is experiencing the issue that
you've been working on?
Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2012-10-30 08:45:50 UTC
on 29/10/2012 21:17 Travis Mikalson said the following:
> Andriy Gapon, I do still have one less convenient remote system left
> that consistently reproduces this issue. Can I get your WIP patch to try
> on that system to see if that system is experiencing the issue that
> you've been working on?

The changes can be (re)viewed here:
https://github.com/avg-I/freebsd/compare/master...avg;zfs-vfs

The diff against recent head is here:
http://people.freebsd.org/~avg/zfs-vfs.3.diff

-- 
Andriy Gapon
Comment 14 patrick.lamaiziere 2012-11-29 10:31:10 UTC
Hello,

For the record, see this recent discussion on freebsd-stable@ :
http://lists.freebsd.org/pipermail/freebsd-stable/2012-November/070637.html

The patch in 
http://lists.freebsd.org/pipermail/freebsd-stable/2012-November/070892.html
seems to solve the reboot issue for me.

Regards
Comment 15 Andriy Gapon freebsd_committer freebsd_triage 2012-12-16 16:01:17 UTC
Responsible Changed
From-To: freebsd-bugs->avg

It seems that Iam working on the related code.
Comment 16 Andriy Gapon freebsd_committer freebsd_triage 2012-12-16 16:01:59 UTC
State Changed
From-To: open->patched

Should be fixed in head.
Comment 17 Andriy Gapon freebsd_committer freebsd_triage 2013-01-21 15:45:52 UTC
State Changed
From-To: patched->closed

Should be fixed in head and stable/9 now.  No MFC to stable/8 is planned 
at the moment, because of differences in the ZFS code bases.