Bug 25264

Summary: Kernel trap 12 in camisr
Product: Base System Reporter: Yevgeniy Aleynikov <eugenea>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.2-RELEASE   
Hardware: Any   
OS: Any   

Description Yevgeniy Aleynikov 2001-02-21 22:40:00 UTC
Several times a day kernel crashes with following diag.
Cant find coredump because SCSI system is not functioning during crash.

login: panic: lockmgr: pid 64908, not exclusive lock holder 380 unlocking
mp_lock = 00000001; cpuid = 0; lapic.id = 01000000
boot() called on cpu#0

syncing disks... 203 204 204 204 204 204 204 204 204 204 204 204 204 204
204 204
 204 204 204 204 
giving up on 202 buffers
Uptime: 15h47m27s
(da1:ahc0:0:2:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0 
(da1:ahc0:0:2:0): ILLEGAL REQUEST asc:20,0
(da1:ahc0:0:2:0): Invalid command operation code

dumping to dev #da/0x20001, offset 1048704
dump 

Fatal trap 12: page fault while in kernel mode
mp_lock = 00000002; cpuid = 0; lapic.id = 01000000
fault virtual address   = 0x0
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc0121968
stack pointer           = 0x10:0xdbc0095c
frame pointer           = 0x10:0xdbc0096c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 64908 (httpd)
interrupt mask          = net tty bio cam  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 00000002; cpuid = 0; lapic.id = 01000000
boot() called on cpu#0
Uptime: 15h47m28s


Fatal trap 12: page fault while in kernel mode
mp_lock = 00000003; cpuid = 0; lapic.id = 01000000
fault virtual address   = 0x10
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0121a84
stack pointer           = 0x10:0xdbc00578
frame pointer           = 0x10:0xdbc0058c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 64908 (httpd)
interrupt mask          = net tty bio cam  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 00000003; cpuid = 0; lapic.id = 01000000
boot() called on cpu#0
Uptime: 15h47m28s


Fatal trap 12: page fault while in kernel mode
mp_lock = 00000004; cpuid = 0; lapic.id = 01000000
fault virtual address   = 0x10
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0121a84
stack pointer           = 0x10:0xdbc00194
frame pointer           = 0x10:0xdbc001a8
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 64908 (httpd)
interrupt mask          = net tty bio cam  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 00000004; cpuid = 0; lapic.id = 01000000
boot() called on cpu#0
Uptime: 15h47m28s

--------skipped several traps with the same instruction pointer-----

(kgdb) x/i 0xc0121968
0xc0121968 <camisr+200>:        mov    %eax,(%edx)

(kgdb) list *0xc0121968
0xc0121968 is in camisr (../../cam/cam_queue.h:224).
219
220     static __inline void
221     cam_ccbq_ccb_done(struct cam_ccbq *ccbq, union ccb *done_ccb)
222     {
223             TAILQ_REMOVE(&ccbq->active_ccbs, &done_ccb->ccb_h,
224                          xpt_links.tqe);
225             ccbq->dev_active--;
226             ccbq->dev_openings++;
227             ccbq->held++;
228     }

(kgdb) x/i 0xc0121a84
0xc0121a84 <camisr+484>:        pushl  (%eax)

(kgdb) list *0xc0121a84
0xc0121a84 is in camisr (../../cam/cam_xpt.c:6332).
6327                    } else if (runq) {
6328                            xpt_run_dev_sendq(ccb_h->path->bus);
6329                    }
6330
6331                    /* Call the peripheral driver's callback */
6332                    (*ccb_h->cbfcnp)(ccb_h->path->periph, (union ccb
*)ccb_h);
6333
6334                    /* Raise IPL for while test */
6335                    s = splcam();
6336            }

----------

There's another server also that crashes similar way.
SCSI is terminated propertly. SCSI tape (SE device) is on the second 
SCSI bus alone.

Fix: 

None
How-To-Repeat: 
    Just keep it running under heavy user load (web+cgi).
Comment 1 Eugene Aleynikov 2001-02-22 22:01:02 UTC
Another trap:
 panic: lockmgr: pid 44687, not exclusive lock holder 374 unlocking
mp_lock = 01000001; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1

syncing disks... panic: rslock: cpu: 1, addr: 0xc3d25e00, lock:
0x01000001
mp_lock = 01000001; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
Uptime: 3h57m11s
(da1:ahc0:0:2:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0 
(da1:ahc0:0:2:0): ILLEGAL REQUEST asc:20,0
(da1:ahc0:0:2:0): Invalid command operation code

dumping to dev #da/0x20001, offset 1048704
dump 

Fatal trap 12: page fault while in kernel mode
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000

fault virtual address   = 0x0
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc0121968

stack pointer           = 0x10:0xdbbc6800
frame pointer           = 0x10:0xdbbc6810
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 44687 (proftpd)
interrupt mask          = net tty bio cam  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
Uptime: 3h57m11s


Fatal trap 12: page fault while in kernel mode
mp_lock = 01000003; cpuid = 1; lapic.id = 00000000
fault virtual address   = 0x10
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0121a84
stack pointer           = 0x10:0xdbbc641c
frame pointer           = 0x10:0xdbbc6430
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 44687 (proftpd)
interrupt mask          = net tty bio cam  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 01000003; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
Uptime: 3h57m11s


0xc0121968 is in probedone (../../cam/cam_xpt.c:5619).
5614     path->device->serial_num_len =
5615         serial_buf->length;
5616     path->device->serial_num[serial_buf->length]
5617         = '\0';
5618    }
5619   } else if (cam_periph_error(done_ccb, 0,
5620          SF_RETRY_UA|SF_NO_PRINT,
5621          &softc->saved_ccb) == ERESTART) {
5622    return;
5623   } else if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) {
(kgdb) list *0xc0121a84
0xc0121a84 is in probedone (../../cam/cam_xpt.c:5693).
5688    /* Don't wedge the queue */
5689    xpt_release_devq(done_ccb->ccb_h.path, /*count*/1,
5690       /*run_queue*/TRUE);
5691   }
5692
5693   path->device->flags &= ~CAM_DEV_UNCONFIGURED;
5694
5695   if ((softc->flags & PROBE_NO_ANNOUNCE) == 0) {
5696    /* Inform the XPT that a new device has been found */
5697    done_ccb->ccb_h.func_code = XPT_GDEV_TYPE;
Comment 2 Eugene Aleynikov 2001-02-26 18:06:52 UTC
After getting rid of nullfs problem was fixed.
This is nullfs issue so we can close this problem.

Nullfs status probably should me moved to broken.
Comment 3 Kris Kennaway freebsd_committer freebsd_triage 2001-03-24 05:40:06 UTC
State Changed
From-To: open->closed

Submitter reports problem was actually due to using 
nullfs