Bug 20484

Summary: FreeBSD 4.0 crashes repeatedly: trap 12: page fault while in kernel mode
Product: Base System Reporter: dl <dl>
Component: kernAssignee: jlemon
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.0-STABLE   
Hardware: Any   
OS: Any   

Description dl 2000-08-08 15:30:01 UTC
The machine crashed 4 times with this configuration,
always with the same reason (see below), but not
after some reproducible actions. The machine was up
more than 20 days in between, but then crashed 
twice in a few days.

The machine runs as a IRCnet Server in production, and
is therefore a primary DOS attack victim, but no
particular attack has been registered (on routers, etc)
when it crashed.

here are the last two kgdb traces:

1.

(no debugging symbols found)...
IdlePTD 3964928
initial pcb at 3266a0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x20
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc01bba4d
stack pointer           = 0x10:0xc02fb320
frame pointer           = 0x10:0xc02fb32c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = 
trap number             = 12
panic: page fault

syncing disks... 

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x30
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0253680
stack pointer           = 0x10:0xc02fb158
frame pointer           = 0x10:0xc02fb15c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = bio 
trap number             = 12
panic: page fault
Uptime: 23d2h32m25s
dumping to dev #da/0x20001, offset 524320
dump 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  0xc0161034 in boot ()
(kgdb) bt
#0  0xc0161034 in boot ()
#1  0xc01613b8 in poweroff_wait ()
#2  0xc02a77ad in trap_fatal ()
#3  0xc02a7485 in trap_pfault ()
#4  0xc02a7083 in trap ()
#5  0xc0253680 in acquire_lock ()
#6  0xc025735c in softdep_update_inodeblock ()
#7  0xc025296d in ffs_update ()
#8  0xc025a534 in ffs_sync ()
#9  0xc018d517 in sync ()
#10 0xc0160e07 in boot ()
#11 0xc01613b8 in poweroff_wait ()
#12 0xc02a77ad in trap_fatal ()
#13 0xc02a7485 in trap_pfault ()
#14 0xc02a7083 in trap ()
#15 0xc01bba4d in tcp_timer_persist ()
#16 0xc01667b1 in softclock ()

AND 2.

(no debugging symbols found)...
IdlePTD 3964928
initial pcb at 3266a0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x20
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc01bba4d
stack pointer           = 0x10:0xc02fb320
frame pointer           = 0x10:0xc02fb32c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = 
trap number             = 12
panic: page fault

syncing disks... 

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x30
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0253680
stack pointer           = 0x10:0xc02fb158
frame pointer           = 0x10:0xc02fb15c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = bio 
trap number             = 12
panic: page fault
Uptime: 3d17h36m34s

dumping to dev #da/0x20001, offset 524320
dump 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  0xc0161034 in boot ()
(kgdb) bt
#0  0xc0161034 in boot ()
#1  0xc01613b8 in poweroff_wait ()
#2  0xc02a77ad in trap_fatal ()
#3  0xc02a7485 in trap_pfault ()
#4  0xc02a7083 in trap ()
#5  0xc0253680 in acquire_lock ()
#6  0xc025735c in softdep_update_inodeblock ()
#7  0xc025296d in ffs_update ()
#8  0xc025a534 in ffs_sync ()
#9  0xc018d517 in sync ()
#10 0xc0160e07 in boot ()
#11 0xc01613b8 in poweroff_wait ()
#12 0xc02a77ad in trap_fatal ()
#13 0xc02a7485 in trap_pfault ()
#14 0xc02a7083 in trap ()
#15 0xc01bba4d in tcp_timer_persist ()
#16 0xc01667b1 in softclock ()
(kgdb) quit


It is no mistake, but a fact, that the reports are identical.
All pointers and addresses seem to be the same. For that reason
I guess hardware failure may be outruled, but I have no experience
in analysing crash dumps, so I can't really tell.

I'm also wondering, if/why there seem to be _two_ panics
occuring ? Or is this just regular behaviour during a crash dump.

As this is a production machine that should have long
uptimes, I did not build a debugging kernel, yet, but
will do, to get more info during the next crash (if it happens).

Fix: 

No idea, BUT:

I've installed a BIOS upgrade to 1007, that seemed to solve the
problem with ECC RAM and updated to 4.1-STABLE.
So the box is currently running with ECC enabled and 4.1-STABLE,
however I cannot tell, if it will work from now on, yet, but
I keep you up to date.
How-To-Repeat: 
Well, I can't tell. It just happened, but on no particular
action, that could possibly been identified.
Comment 1 Sheldon Hearn freebsd_committer freebsd_triage 2000-08-08 15:49:35 UTC
Responsible Changed
From-To: freebsd-bugs->jlemon

tcp_timer_persist() seems to belong to Jonathan.
Comment 2 jlemon freebsd_committer freebsd_triage 2001-06-15 22:29:22 UTC
State Changed
From-To: open->closed

This bug was fixed in 4.1.1.