| Summary: | [cam] Possible interrupt masking trouble in sys/cam/cam_xpt.c | ||
|---|---|---|---|
| Product: | Base System | Reporter: | campt <campt> |
| Component: | kern | Assignee: | Justin T. Gibbs <gibbs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.3-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Responsible Changed From-To: freebsd-bugs->gibbs Justin - this PR seems to contain some interesting info which looks like it applies to the generic cam code. Maybe you could take a look? State Changed From-To: open->suspended Mark this aging PR as suspended. State Changed From-To: suspended->closed CAM locking has radically changed since this was submitted. No longer relevant. |
We have over a long period of time been experiencing seemingly random crashes using FreeBSD 3.1 and now FreeBSD 4.3 all related to the disk i/o system. Our application uses a custom driver with a Qlogic ISP controller operating in target mode. After extensive source code auditing in our driver code we could find no further problems. However setting up sanity checks in portions of the sys/cam/cam_xpt.c code showed what appeared to be queue corruption due to invalid interrupt masking. This problem only shows up under rather heavy load. Sorry to say our driver does a fair amount of work at interrupt level so this may be the underlying trigger problem. However removing and replacing all splsoftcam() calls in sys/cam/cam_xpt.c with splcam() entirely eliminated the problem. Specific problems we had encountered: devstat_end_transaction HELP!! busy_count for da2 < 0 (-1) this was shown to allways result from a devstat_end_transaction_buf occuring within cam/sys/scsi/scsi_da.c:dadone() panic: xpt_run_dev_allocq: Device on queue without any work to do This was found after a bit of testing to be related directly to the next one: Fatal Trap 12: page fault while in kernel mode this was occuring within xpt_run_dev_allocq and was actually due to a NULL pointer being returned by camq_remove on the device queue. Checks added to camq_insert and camq_remove showed that occasionally a queue entry could be added and before camq_insert had finished the entries count would be 0 rather than the expected 1. Particularly convincing was a test inserted that did something similar to this: camq_insert(..) { /* near the top */ saved_entries = queue->entries; /* later */ if(queue->entries < 1) { printf("entries < 1 %d", queue->entries); } else {