As posted to the mailing list recently, Kernels post r357547 hang during boot on my PowerMac G5 Quad, at: mountroot waiting for /dev/ada0s3 (This is with SMP disabled) I have tried revisions: 357547 - Works 357550 - hangs on boot 357556 - hangs on boot 357566 - hangs on boot 357875 - hangs on boot I think the issue may be with commit 357548 as it changes something in: head/sys/powerpc/aim/slb.c Although what its changing may be beyond me!
Hmm, r357548 shouldn't have had an effect by itself, but it could be fallout from related r357549. Could you try setting this from the boot loader or in /boot/loader.conf? vm.debug.uma_multipage_slabs="0" If that works then we can try to find which zone causes trouble. Would you be able to try out a patch that would print debugging info?
(In reply to Ryan Libby from comment #1) Hi, Yep, that works, so r357550 now boots successfully with: vm.debug.uma_multipage_slabs="0" I'm happy to test a patch and update to a more recent rev also. Thanks
Created attachment 211651 [details] Print zone layouts Okay, could you try out this patch? It won't fix anything, it will hopefully just tell us what difference r357549 makes. So, first could you try it with vm.debug.uma_multipage_slabs="0", and then grep for "keg.*layout" in /var/log/messages and attach that somewhere, and then again with vm.debug.uma_multipage_slabs="1"? There should be a number of zones which end up with different layouts between the two and likely one of those is causing the issue.
Created attachment 211652 [details] Print zone layouts Sorry, update to patch, I realized I left off a newline.
(In reply to Francis Little from comment #0) > As posted to the mailing list recently [...] I wasn't able to locate this thread, so if there's more information there then could you please either repost it here or direct me to the thread?
(In reply to Ryan Libby from comment #5) I tink he means: https://lists.freebsd.org/pipermail/freebsd-ppc/2020-February/011444.html https://lists.freebsd.org/pipermail/freebsd-ppc/2020-February/011504.html but I doubt they add anything. (The one between them is from me and does not have information useful to you.) Going in another direction . . . vm.debug.uma_multipage_slabs="1" leaves his G5 unable to boot (waiting for mountroot). So how is he to follow the directions for testing that case?
(In reply to Ryan Libby from comment #3) Hi, as Mark just mentioned, I can only get the logfile with: vm.debug.uma_multipage_slabs="0" With it set as: vm.debug.uma_multipage_slabs="1" The Machine doesn't get far enough to write the log out! Regards
(In reply to Francis Little from comment #7) If you have an appropriate swap/paging partition you might be able to have /boot/loader.conf contain an assignment of the general form: dumpdev="/dev/<???>" (substituting as necessary). Then you might be able to cause a kernel dump from ddb (say). (Of cource, this might run back into the same or other problems.) If it worked, rebooting with vm.debug.uma_multipage_slabs="0" should pick up that dump if you have room in /var/crash. The dump might contain the tail of the sequence of lines that would have gone into the log file. There might be a way to extract those lines.
(In reply to Mark Millard from comment #8) I got to thinking about my history investigating and I think this is too early for the device to be available for dump to use. Also the hangup likely prevents getting to ddb. So I set up -r357550's artifact materials on a SSD and got into ddb somewhat before the lockup. One part of acttrace looked potentially interesting (typed from a picture and avoiding typing everything for now): Tracing command cam pid 7 tid 100034 td . . . (CPU3) intr_event_handle powerpc_dispatch_intr openpic_dispatch powerpc_interrupt kernel EXI trap by __mtx_unlock_flags __mtx_unlock_flags xpt_copy_path xpt_async cam_periph_error probe_done xpt_done_process xpt_done_td fork_exit fork_trampoline -0x04 I've still got the picture to extract other details from if required. For thread 100034, "show locks" reported one: (I split the line below) exclusive sleep mutex CAM device lock (CAM device lock) r = 0 (. . .) locked @ /usr/src/sys/cam/cam_xpt.c:5490
I tried non-debug kernel builds (that involve some of my own PowerMac patches): Head -r357548 did not have the problem. Head -r357549 did have the problem. So: Confirming Ryan's expectation.
Yeah... I guess that's not going to work easily. (I guess I'm just used to debugging on my VM and captuing the console.) Actually, I _may_ be able to debug this with just information from a system with vm.debug.uma_multipage_slabs="0", and without the above patch and log, but with the output of sysctl vm.uma Otherwise, I can try to reproduce this on a VM, but I am out of town right now so I probably won't have time to try setting that up until next weekend.
(In reply to Ryan Libby from comment #11) I tried to use pictures to get your originally requested informatino. I only have pictures of the non-booting case so I only show the booting case line in full, where the non-booting case had some larger slabsize also listed. UMA Kegs layout: format 0 (ipers 7 * rsize 512) / slabsize 0x1000 = 87% eff vs. ipers 15 with slabsize 0x2000 UMA Zones layout: format 0 (ipers 2 * rsize 1792) / slabsize 0x1000 = 87% eff vs. ipers 9 with slabsize 0x4000 keg VMSPACE layout: format 0 (ipers 2 * rsize 1400) / slabsize 0x1000 = 68% eff vs. ipers 5 slabsize 0x2000 vs. ipers 8 slabsize 0x3000 keg SLB cache layout: format 0 (ipers 7 * rsize 520) / slabsize 0x1000 = 88% eff vs. ipers 15 slabsize 0x2000 keg THREAD layout: format 0 (ipers 2 * rsize 1376) / slabsize 0x1000 = 67% eff vs. ipers 5 slabsize 0x2000 vs. ipers 8 slabsize 0x3000 vs. ipers 11 slabsize 0x4000 keg NCLNODE layout: format 0 (ipers 6 * rsize 592) / slabsize 0x1000 = 86% eff vs. ipers 13 slabsize 0x2000 keg socket layout: format 0 (ipers 4 * rsize 904) / slabsize 0x1000 = 88% eff vs. ipers 9 slabsize 0x2000 keg sctp_asoc layout: format 0 (ipers 1 * rsize 2288) / slabsize 0x1000 = 55% eff vs. ipers 3 slabsize 0x2000 vs. ipers 5 slabsize 0x3000 keg sctp_raddr layout: format 0 (ipers 5 * rsize 736) / slabsize 0x1000 = 89% eff vs. ipers 11 skabsize 0x2000 I will note that sometimes FFS inode, FFS1 dinode, and FFS2 dinode lines were listed for the failing boots. In the booting case, those are the last 3 keg lines. slabsize 0x1000 always showed for any that displayed. I got all but the final lines via booting and getting to ddb> and then scrolling around, taking pictures. Picture comparison means a higher probability that I missed some example(s) that should be listed above.
(In reply to Mark Millard from comment #12) Found one that I missed earlier: keg Mountpoints layout: format 0 (ipers 1 * rsize 2944) / slabsize 0x1000 = 71% eff vs. ipers 4 slabsize 0x3000
Created attachment 211661 [details] slb_zone_init: unconditional UMA_ZONE_CONTIG (In reply to Mark Millard from comment #12) Thanks! This one looks especially interesting: > keg SLB cache layout: format 0 (ipers 7 * rsize 520) / slabsize 0x1000 = 88% eff > vs. ipers 15 slabsize 0x2000 Based on that, could either of you try out this patch? I have not compile tested it, so I apologize in advance if it is broken.
(In reply to Ryan Libby from comment #14) It boots. I'll see about getting a diff of the keg lines. It might catch things I missed.
(In reply to Mark Millard from comment #15) The diff: # more /mnt/root/keg_lines_diff.txt --- /root/keg_lines_0.txt 2020-02-15 06:06:11.318230000 +0000 +++ /root/keg_lines_1.txt 2020-02-15 06:06:55.069131000 +0000 @@ -1,3 +1,5 @@ keg UMA Kegs layout: format 0 (ipers 7 * rsize 512) / slabsize 0x1000 = 87% eff +keg UMA Kegs layout: format 0 (ipers 15 * rsize 512) / slabsize 0x2000 = 93% eff keg UMA Zones layout: format 0 (ipers 2 * rsize 1792) / slabsize 0x1000 = 87% eff +keg UMA Zones layout: format 0 (ipers 9 * rsize 1792) / slabsize 0x4000 = 98% eff keg UMA Slabs 0 layout: format 0 (ipers 50 * rsize 80) / slabsize 0x1000 = 97% eff @@ -26,2 +28,4 @@ keg VMSPACE layout: format 0 (ipers 2 * rsize 1400) / slabsize 0x1000 = 68% eff +keg VMSPACE layout: format 0 (ipers 5 * rsize 1400) / slabsize 0x2000 = 85% eff +keg VMSPACE layout: format 0 (ipers 8 * rsize 1400) / slabsize 0x3000 = 91% eff keg UPVO entry layout: format 0 (ipers 42 * rsize 96) / slabsize 0x1000 = 98% eff @@ -53,2 +57,3 @@ keg filedesc0 layout: format 0 (ipers 3 * rsize 1104) / slabsize 0x1000 = 80% eff +keg filedesc0 layout: format 0 (ipers 7 * rsize 1104) / slabsize 0x2000 = 94% eff keg rangeset pctrie nodes layout: format 0 (ipers 28 * rsize 144) / slabsize 0x1000 = 98% eff @@ -62,2 +67,5 @@ keg THREAD layout: format 0 (ipers 2 * rsize 1376) / slabsize 0x1000 = 67% eff +keg THREAD layout: format 0 (ipers 5 * rsize 1376) / slabsize 0x2000 = 83% eff +keg THREAD layout: format 0 (ipers 8 * rsize 1376) / slabsize 0x3000 = 89% eff +keg THREAD layout: format 0 (ipers 11 * rsize 1376) / slabsize 0x4000 = 92% eff keg cpuset layout: format 0 (ipers 31 * rsize 128) / slabsize 0x1000 = 96% eff @@ -92,2 +100,3 @@ keg NCLNODE layout: format 0 (ipers 6 * rsize 592) / slabsize 0x1000 = 86% eff +keg NCLNODE layout: format 0 (ipers 13 * rsize 592) / slabsize 0x2000 = 93% eff keg DIRHASH layout: format 0 (ipers 3 * rsize 1024) / slabsize 0x1000 = 75% eff @@ -101,2 +110,3 @@ keg Mountpoints layout: format 0 (ipers 1 * rsize 2944) / slabsize 0x1000 = 71% eff +keg Mountpoints layout: format 0 (ipers 4 * rsize 2944) / slabsize 0x3000 = 95% eff keg ksiginfo layout: format 0 (ipers 36 * rsize 112) / slabsize 0x1000 = 98% eff @@ -105,2 +115,3 @@ keg socket layout: format 0 (ipers 4 * rsize 904) / slabsize 0x1000 = 88% eff +keg socket layout: format 0 (ipers 9 * rsize 904) / slabsize 0x2000 = 99% eff keg unpcb layout: format 0 (ipers 15 * rsize 256) / slabsize 0x1000 = 93% eff @@ -123,4 +134,7 @@ keg sctp_asoc layout: format 0 (ipers 1 * rsize 2288) / slabsize 0x1000 = 55% eff +keg sctp_asoc layout: format 0 (ipers 3 * rsize 2288) / slabsize 0x2000 = 83% eff +keg sctp_asoc layout: format 0 (ipers 5 * rsize 2288) / slabsize 0x3000 = 93% eff keg sctp_laddr layout: format 0 (ipers 84 * rsize 48) / slabsize 0x1000 = 98% eff keg sctp_raddr layout: format 0 (ipers 5 * rsize 736) / slabsize 0x1000 = 89% eff +keg sctp_raddr layout: format 0 (ipers 11 * rsize 736) / slabsize 0x2000 = 98% eff keg sctp_chunk layout: format 0 (ipers 26 * rsize 152) / slabsize 0x1000 = 96% eff @@ -136,2 +150,3 @@ keg swpctrie layout: format 0 (ipers 28 * rsize 144) / slabsize 0x1000 = 98% eff +keg swblk layout: format 0 (ipers 29 * rsize 136) / slabsize 0x1000 = 96% eff keg FFS inode layout: format 0 (ipers 25 * rsize 160) / slabsize 0x1000 = 97% eff So: swblk missing vs. existing is something I missed, along with originally missing Mountpoints.
(In reply to Mark Millard from comment #16) The diff also shows that I missed: filedesc0
Created attachment 211663 [details] Output of slabs="0"
Created attachment 211664 [details] Output of slabs="1"
I have applied both patches and attached the output of grep "keg.*layout" in /var/log/messages for both: vm.debug.uma_multipage_slabs="0" and vm.debug.uma_multipage_slabs="1"
(In reply to Ryan Libby from comment #14) FYI: I made a -r357979 based kernel that also had the patches involved. It booted the example G5 Quad just fine. (World/base was still the artifacts material for -r357550 instead of being my normal environment.) So it appears that this is the last issue outstanding for having a booting context on these types of machines, at least for now.
(In reply to Mark Millard from comment #21) Note: I progressed my normal powerpc64 PowerMac environment to head -r357979 as well. The patch has worked fine on the G5 "Quad" for that context as well.
A commit references this bug: Author: rlibby Date: Wed Feb 19 04:46:41 UTC 2020 New revision: 358094 URL: https://svnweb.freebsd.org/changeset/base/358094 Log: powerpc: unconditionally mark SLB zones UMA_ZONE_CONTIG PR: 244118 Reported by: Francis Little <oggy at farscape.co.uk> Tested by: Francis Little, Mark Millard <marklmi at yahoo.com> Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23729 Changes: head/sys/powerpc/aim/slb.c
Regression in head from r357549, fixed in r358094.