Bug 201421 - locking issues prevent new process creation
Summary: locking issues prevent new process creation
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: 10.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-usb (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-08 10:16 UTC by Shane
Modified: 2017-01-27 22:05 UTC (History)
1 user (show)

See Also:


Attachments
ASUS P8H61-M LE/USB3 dmesg (12.20 KB, text/plain)
2015-09-19 12:12 UTC, Shane
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shane 2015-07-08 10:16:05 UTC
Current system is - FreeBSD leader.local 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #13 r285123: Sat Jul  4 15:30:32 ACST 2015     root@leader.local:/usr/obj/usr/src/sys/GENERIC  amd64

For some time I have had repeated trouble forcing me to restart. I have had a couple of comments from freebsd-stable mailing list to suggest it is locking related but no suggestions for gathering more useful data. This isn't something I can say 'do this and it breaks', I can only say it has been a long time since I have had more than a few days uptime. Before changing to 10.1 (from 9.2) I only restarted about once a month after installing updates.

While most programs already running will keep running ok, many new processes fail to start. An instance of top started earlier will stop updating(usually my first confirmation of needing to restart), ps and proctstat will always fail. I can run `xterm -e tcsh -f`, to bypass my shell login script.

The only info I know to collect is `thread apply all bt` from kgdb. The info I have collected can be found with a copy of boot.dmesg at http://shaneware.biz/freebsddebugdata/

For the last two months I have included the uptime output and you will find 6 occasions I have made it to more than one day uptime. While I have had other restarts, these are occasions when I have been able to get kgdb output, others have usually been a complete lockup mostly when using poudriere. 

At least two occasions I have restarted, logged in and run startxfce4 then left it to sit while I had something else to do only to return maybe one or two hours later and have to restart again before I can do anything.

The few times I have got an output from `procstat -kk -a` has been when inserting a usb memstick has failed to create a new device entry, preventing me from mounting the filesystem. Over the last 2 months I have had this happen on three occasions, each of which I was able to continue running without use of adding usb devices for between three and seven days uptime which is better than any other I have had. This makes me think that the usb or device creation may be a location of this issue.
Comment 1 Shane 2015-09-19 12:12:19 UTC
Created attachment 161189 [details]
ASUS P8H61-M LE/USB3 dmesg
Comment 2 Shane 2015-09-19 12:14:41 UTC
Now running -FreeBSD leader.local 10.2-STABLE FreeBSD 10.2-STABLE #17 r287561: Fri Sep 11 12:15:25 ACST 2015     root@leader.local:/usr/obj/usr/src/sys/GENERIC  amd64

It seems I can add some useful light to this issue. The issue appears to be usb related, as I mentioned I have had times of being unable to mount usb memsticks which has been getting more frequent, when this fails I have recently been able to go as long as 3 weeks uptime.

After a recent time of usb memstick failing, I checked that I was able to mount the device when inserted into a rear usb3 port, also while mounting the memstick fails I can turn on a midi keyboard (connected to rear usb2 port), have devd load the snd_uaudio kmod and have it work as expected. The memstick device insertion shows in dmesg (including specifying da4 for the device) but the device entry fails to get created, or the partition device fails, that is sometimes /dev/da4 will exist but not /dev/da4s1, further removal and insertion will keep showing in dmesg but not /dev. This would indicate that the failing to create devices is restricted to one usb bus.

This lead me to recall that as well as the front mounted usb ports I have an internal usb multi card reader connected to the motherboard. After installing 9.0 I was unable to mount cards inserted into this reader so have never used it since. I have noticed this device appears to go to sleep and it's devices get removed (da0-da3) after which adding a memstick will get da0 instead of the normal da4. After disconnecting this device I am now approaching 3 days uptime which is the best I have had in a year while still being able to mount usb memsticks.

So it would appear that after installing 10.1 this device is causing the usb scanning/device creation to fail, which is a regression from 9.x. Possibly it is the device going to sleep or failing to wake that causes a long response which holds a lock too long?

This machine has an ASUS P8H61-M LE/USB3 motherboard http://www.asus.com/au/Motherboards/P8H61M_LEUSB3/specifications/ which has an Intel H61 express chipset and Asmedia USB3 controller

With the full boot.dmesg attached the usb bus and the card reader device was showing up as --

ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfb307000-0xfb3073ff irq 23 at device 26.0 on pci0
usbus0: EHCI version 1.0
usbus0 on ehci0

xhci0: <ASMedia ASM1042 USB 3.0 controller> mem 0xfb100000-0xfb107fff irq 19 at device 0.0 on pci5
xhci0: 32 byte context size.
usbus1 on xhci0
pcib6: <ACPI PCI-PCI bridge> irq 17 at device 28.4 on pci0
pci6: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci6
pci7: <ACPI PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on pci0
pci8: <ACPI PCI bus> on pcib8
ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfb306000-0xfb3063ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2 on ehci1

usbus1: 5.0Gbps Super Speed USB v3.0
usbus2: 480Mbps High Speed USB v2.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
ugen1.1: <0x1b21> at usbus1
uhub1: <0x1b21 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2

Root mount waiting for: usbus2 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
Root mount waiting for: usbus2 usbus0
ugen0.2: <vendor 0x8087> at usbus0
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0
ugen2.2: <vendor 0x8087> at usbus2
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
uhub3: 4 ports with 4 removable, self powered
Root mount waiting for: usbus2 usbus0
uhub4: 6 ports with 6 removable, self powered
ugen0.3: <Generic> at usbus0
umass0: <Generic Mass Storage Device, class 0/0, rev 2.00/1.29, addr 3> on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x4001
umass0:5:0:-1: Attached to scbus5
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0
da0: <Generic USB SD Reader 1.00> Removable Direct Access SCSI-0 device
da0: Serial Number 058F312D81B
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
da0: quirks=0x2<NO_6_BYTE>
da1 at umass-sim0 bus 0 scbus5 target 0 lun 1
da1: <Generic USB CF Reader 1.01> Removable Direct Access SCSI-0 device
da1: Serial Number 058F312D81B
da1: 40.000MB/s transfers
da1: Attempt to query device size failed: NOT READY, Medium not present
da1: quirks=0x2<NO_6_BYTE>
da2 at umass-sim0 bus 0 scbus5 target 0 lun 2
da2: <Generic USB SM Reader 1.02> Removable Direct Access SCSI-0 device
da2: Serial Number 058F312D81B
da2: 40.000MB/s transfers
da2: Attempt to query device size failed: NOT READY, Medium not present
da2: quirks=0x2<NO_6_BYTE>
ugen2.3: <vendor 0x1a40> at usbus2
uhub5: <vendor 0x1a40 USB 2.0 Hub, class 9/0, rev 2.00/1.11, addr 3> on usbus2
da3 at umass-sim0 bus 0 scbus5 target 0 lun 3
da3: <Generic USB MS Reader 1.03> Removable Direct Access SCSI-0 device
da3: Serial Number 058F312D81B
da3: 40.000MB/s transfers
da3: Attempt to query device size failed: NOT READY, Medium not present
da3: quirks=0x2<NO_6_BYTE>
Comment 3 John Baldwin freebsd_committer freebsd_triage 2015-10-19 23:05:56 UTC
Adding Hans Petter to comment on USB.
Comment 4 Hans Petter Selasky freebsd_committer freebsd_triage 2015-10-20 08:11:02 UTC
(In reply to John Baldwin from comment #3)
Hi,

If you run "usbconfig" as root, and if it doesn't respond, looks like it is hanging, it typically means USB is not able to detach one of the device drivers. In recent past it is know that sudden detach of USB memory sticks can cause this error, because of refcounts inside CAM / SCSI not going to zero.

Also you need to ensure that any USB MIDI devices gets closed by the application when they receive a read error. Else the open file handle will block further USB enumeration. And will also lock other applications which are enumerating USB on that particular USB device.

When running KGDB, you could check the backtrace of the USB explore threads.

ps auxwH | grep -i usb

Did you try 10-stable or 9-stable and not the release branches. From time to time bugfixes are merged to these branches.

--HPS
Comment 5 Shane 2015-10-21 08:04:45 UTC
I started running stable/10 back in December, sometimes building updates every couple of weeks.

Expanding from my last entry - it is definitely related to the usb multi card reader that *was* attached internally. Since disconnecting that I have not had an issue, current uptime is 24 days 14 hours - best time in almost 12 months.

While sometimes I had trouble mounting a usb memstick before the locking issue that prevents process creation, other times the locking happened without trying to mount a device. I wasn't sure at first if these two were related.

As I always got varying uptimes from 20 minutes to 2 days, there would be a timing factor where maybe the card reader goes to sleep just as it was being scanned... I expect that is then holding the lock that causes trouble later.

It also appeared to only initially affect the usb bus that the card reader was attached, as the example of failing to mount a memstick in the front case-mounted usb port while a rear usb3 port worked. Then once the locking kicked in, neither worked.

The reader was installed when I got the machine, I first installed 9.0 (I think RC3) and updated to 9.1 and 9.2 without issue, then I installed 10.1 RC2 and the problems started.

I get this can be hard to pin point and also hard to test as I can't reproduce the issue on cue, and as no-one else seems to have this issue it may even be this reader attached to this motherboard, I am happy to leave the reader disconnected and move on.