|Summary:||panic: spin lock held too long|
|Product:||Base System||Reporter:||Wojciech Giel <wkg21>|
|Component:||kern||Assignee:||freebsd-bugs mailing list <bugs>|
|Severity:||Affects Only Me||CC:||kib, kpraveen.lkml, vangyzen|
Description Wojciech Giel 2016-10-11 09:54:34 UTC
Created attachment 175614 [details] panic screenshot Hello, I'm trying to install FreeBSD 11.0-RELEASE on Dell R620 poweredge with Perc H310mini raid controller. Controller is currently configured in jbod mode. I'm getting panic consistently whenever installer tried to access hard drives. I have tried zfs, ufs with raid, without raid. The problem doesn't exists when i install freebsd 10.3. I have this machine for testing for a while so I can help with debugging. cheers Wojciech
Comment 1 Andriy Gapon 2016-10-12 15:36:30 UTC
Would it be possible for you to try a latest snapshot from here ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/12.0/ in a suitable format? A kernel in the snapshot should have more debugging facilities compiled in, so that might help to get more information.
Comment 2 Wojciech Giel 2016-10-13 13:26:28 UTC
Created attachment 175706 [details] panic with 12.0-CURRENT
Comment 3 Wojciech Giel 2016-10-13 13:28:30 UTC
I have tried install 12-CUREENT got the same kernel panic see attachment. replaced H310 controller with H710 but behaviour is the same.
Comment 4 Andriy Gapon 2016-10-13 14:17:00 UTC
(In reply to Wojciech Giel from comment #3) Could you please reproduce this again? And once you are at the 'db>' prompt please execute the following commands: db> bt db> tid 10186 [*] db> bt where '10186' is a place holder for the id reported in the "too long" message. Please capture the output. Thank you.
Comment 7 Andriy Gapon 2016-10-13 17:00:35 UTC
Apologies for my conventions, but please do not enter '[*]', it should be just 'tid <number>'. I used '[*]' only to draw your attention to a fact that '10186' should not be entered verbatim. Sorry. Could you please get the stack traces again?
Comment 8 Andriy Gapon 2016-10-13 17:02:50 UTC
Also, it's not 'bt tid xxxx' on one line. Those are 3 separate commands: 'bt', 'tid xxxx', 'bt'. You hit enter after typing each.
Comment 11 Wojciech Giel 2016-10-14 09:26:04 UTC
tid 100245 returns No such command?
Comment 12 Andriy Gapon 2016-10-14 12:21:55 UTC
(In reply to Wojciech Giel from comment #11) I apologise again, I confused a kgdb command with a ddb command. So, instead of 'tid' it should be 'thread'. Could you please obtain the outputs again?
Comment 13 Wojciech Giel 2016-10-14 12:46:24 UTC
Created attachment 175739 [details] bt thread no much there
Comment 14 Andriy Gapon 2016-10-14 13:10:06 UTC
(In reply to Wojciech Giel from comment #13) I am a little bit confused. Could you please do the following? <get the panic> <take a picture> bt <take a picture> thread xxxx bt <take a paicture> So, I want to get 3 pictures of the same panic. Thank you.
Comment 17 Wojciech Giel 2016-10-14 13:15:59 UTC
Created attachment 175744 [details] bt second time got three separate screenshots
Comment 18 Andriy Gapon 2016-10-14 13:54:35 UTC
(In reply to Wojciech Giel from comment #17) Well, my instructions started with <get the panic> <take a picture> bt So, the first picture should be taken before any commands. Please take another 3 pictures according to the instructions in comment #14.
Comment 23 Andriy Gapon 2016-10-14 16:21:04 UTC
Using your latest pictures as an example, you did 'thread 100186', but I what asked you to do was 'thread 100346'. Remember, I said the id reported in the "too long" message. That message is at the top of the first picture. So, could you please do this once again using a correct thread number?
Comment 27 Wojciech Giel 2016-10-14 16:36:55 UTC
Created attachment 175755 [details] 04.thread cd sorry. it was a bit confusing.
Comment 28 Andriy Gapon 2016-10-14 20:25:50 UTC
(In reply to Wojciech Giel from comment #27) No worries. Thank you very much! The latest information is something to chew on.
Comment 29 Andriy Gapon 2016-10-14 20:38:58 UTC
A fellow developer suggests that the following command could provide more of interesting information: show active trace Could you please reproduce the panic and run that command? It should result in a several screenfuls of output on your system, it's important to catch them all. Thank you.
Comment 30 Wojciech Giel 2016-10-17 09:47:36 UTC
show active trace gives: "No such command"
Comment 31 Andriy Gapon 2016-11-05 08:20:49 UTC
There are newer snapshots available now, they should have the command. If you still have access to the hardware and interested in debugging this issue, could you please try a newer snapshot? Thank you.
Comment 59 Wojciech Giel 2016-11-07 12:55:16 UTC
(In reply to Andriy Gapon from comment #29) uploaded "several screenshot" :-).
Comment 60 Andriy Gapon 2016-11-07 14:47:06 UTC
(In reply to Wojciech Giel from comment #59) Thank you! TLDR version of the screenshot for anyone else interested in the problem: one thread panics because it waits "too long" to acquire the ipi spin lock, the lock is held by a thread waiting for the targeted tlb shootdown to be executed by other cpus, the rest of the cpus are idle, acpi_cpu_idle_mwait is used for that. Interesting attachments: https://bz-attachments.freebsd.org/attachment.cgi?id=176718 https://bz-attachments.freebsd.org/attachment.cgi?id=176719 https://bz-attachments.freebsd.org/attachment.cgi?id=176720 Wojciech, could you please try setting the following at loader prompt before booting the kernel? debug.acpi.disabled=mwait
Comment 61 Wojciech Giel 2016-11-07 17:35:03 UTC
(In reply to Andriy Gapon from comment #60) set that setting but still got panic: spin lock held too long
Comment 62 Konstantin Belousov 2016-11-07 18:40:03 UTC
(In reply to Wojciech Giel from comment #61) I have no good suggestion about this case. First, is it really true that all other CPUs are executing idle threads ? Might be there is one, besides the two IPI callers, which is not. Second, look for update of your BIOS and reflash it. Update the perc firmware as well. Third, try to boot e.g. from the USB disk, does it work ? If system boots, try to access the drives on the Perc controller. Show verbose dmesg of the successful boot.
Comment 63 Andriy Gapon 2016-11-08 06:57:01 UTC
(In reply to Konstantin Belousov from comment #62) As far as I understand there has never been a successful boot on that system. And looking through screenshots of "show active trace" output I do not see any other running thread.
Comment 64 Konstantin Belousov 2016-11-08 10:28:59 UTC
(In reply to Andriy Gapon from comment #63) The sentence in the original report which makes me wonder is 'I'm getting panic consistently whenever installer tried to access hard drives.' I am not sure whether it mean that installer program started, or that the install media panics outright on boot. Anyway, the information I requested is what needed to define the next steps.
Comment 65 Wojciech Giel 2016-11-08 13:38:11 UTC
(In reply to Andriy Gapon from comment #63) This machine had installed FreeBsd 10.0. We decided to rebuild this machine with 11.0. It fails at the stage of spinning drives some times during booing of install cd some times when I accept disk layout in the installer. Bios and raid firmware is up to date.