Using geom_cache with size>1000 leads to panics in g_up/g_down (g_cache_deliver/g_cache_read) panic44: full log http://paste.org.ru/?l1sjnk Tracing pid 13 tid 100014 td 0xfffffe0002c20460 kdb_enter() at kdb_enter+0x3b panic() at panic+0x180 g_cache_deliver() at g_cache_deliver+0x274 g_cache_done() at g_cache_done+0x80 g_io_schedule_up() at g_io_schedule_up+0xce g_up_procbody() at g_up_procbody+0x72 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8000275d00, rbp = 0 --- panic45: full log http://paste.org.ru/?egp7hk Tracing pid 13 tid 100015 td 0xfffffe0002c20000 g_cache_read() at g_cache_read+0xee g_cache_start() at g_cache_start+0x34e g_io_schedule_down() at g_io_schedule_down+0x1d4 g_down_procbody() at g_down_procbody+0x72 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800027ad00, rbp = 0 --- Fix: Using smaller size for gcache or turning it off with kern.geom.cache.enable=0 How-To-Repeat: Create geom cache with parameters like "gcache create -b128k -s2000 test". (seems gcache works stable if using size 100-200, but panics on 1000+). ports & src trees work good as test pattern for filesystem. panics occur during massive io operations: "find /mnt/test" or reading all files often leads to panic in g_down; dumping/restoring entire fs (ports+src) always triggers panic.
Trace from older revision (with full vmcore) Dump header from device /dev/ad4s1b Architecture: amd64 Architecture Version: 2 Dump Length: 2973687808B (2835 MB) Blocksize: 512 Dumptime: Wed Mar 16 03:33:59 2011 Hostname: cel.home Magic: FreeBSD Kernel Dump Version String: FreeBSD 9.0-CURRENT #0 r219141: Thu Mar 3 06:07:03 EET 2011 root@cel.home:/usr/obj/var/src9/sys/PDC90 Panic String: Dump Parity: 2510468568 Bounds: 32 Dump Status: good Fatal trap 9: general protection fault while in kernel mode cpuid = 2; apic id = 02 instruction pointer = 0x20:0xffffffff8135924e stack pointer = 0x28:0xffffff800007ab30 frame pointer = 0x28:0xffffff800007ab70 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4 (g_down) Physical memory: 8175 MB Dumping 2835 MB: 2820 2804 2788 2772 ... (kgdb) #0 doadump () at /var/src9/sys/kern/kern_shutdown.c:250 #1 0xffffffff802cb7fc in db_fncall (dummy1=Variable "dummy1" is not available. ) at /var/src9/sys/ddb/db_command.c:548 #2 0xffffffff802cbb31 in db_command (last_cmdp=0xffffffff80bc5340, cmd_table=Variable "cmd_table" is not available. ) at /var/src9/sys/ddb/db_command.c:445 #3 0xffffffff802cbd80 in db_command_loop () at /var/src9/sys/ddb/db_command.c:498 #4 0xffffffff802cddf9 in db_trap (type=Variable "type" is not available. ) at /var/src9/sys/ddb/db_main.c:229 #5 0xffffffff8054e5b1 in kdb_trap (type=9, code=0, tf=0xffffff800007aa80) at /var/src9/sys/kern/subr_kdb.c:533 #6 0xffffffff807cd88d in trap_fatal (frame=0xffffff800007aa80, eva=Variable "eva" is not available. ) at /var/src9/sys/amd64/amd64/trap.c:798 #7 0xffffffff807cde5c in trap (frame=0xffffff800007aa80) at /var/src9/sys/amd64/amd64/trap.c:614 #8 0xffffffff807b86ff in calltrap () at /var/src9/sys/amd64/amd64/exception.S:228 #9 0xffffffff8135924e in g_cache_read (sc=0xfffffe0008851800, bp=0xfffffe019f8e2488) at /var/src9/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:116 #10 0xffffffff813597f2 in g_cache_start (bp=0xfffffe019f8e2488) at /var/src9/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:377 #11 0xffffffff804b82c4 in g_io_schedule_down (tp=Variable "tp" is not available. ) at /var/src9/sys/geom/geom_io.c:632 #12 0xffffffff804b86af in g_down_procbody () at /var/src9/sys/geom/geom_kern.c:118 #13 0xffffffff804ee58a in fork_exit ( callout=0xffffffff804b8640 <g_down_procbody>, arg=0x0, frame=0xffffff800007ac50) at /var/src9/sys/kern/kern_fork.c:885 #14 0xffffffff807b8bae in fork_trampoline () at /var/src9/sys/amd64/amd64/exception.S:569 #15 0x0000000000000000 in ?? () #16 0x0000000000000000 in ?? () #17 0x0000000000000001 in ?? () #18 0x0000000000000000 in ?? () #19 0x0000000000000000 in ?? () (kgdb) fr 9 #9 0xffffffff8135924e in g_cache_read (sc=0xfffffe0008851800, bp=0xfffffe019f8e2488) at /var/src9/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:116 116 TAILQ_REMOVE(&sc->sc_usedlist, dp, d_used); (kgdb) p sc->sc_usedlist $1 = {tqh_first = 0xfffffe00a451e680, tqh_last = 0xfffffe00a451e6b8} (kgdb) p dp $2 = (struct g_cache_desc *) 0xfffffe00a451e680 (kgdb) p d_used No symbol "d_used" in current context. (kgdb) x 0xfffffe00a451e680 0xfffffe00a451e680: 0xdeadc0de (kgdb) x 0xfffffe00a451e6b8 0xfffffe00a451e6b8: 0xdeadc0de (kgdb) x dp 0xfffffe0008851800: 0x069cd300 (kgdb) x 0xfffffe019f8e2488 0xfffffe019f8e2488: 0x00000001
Responsible Changed From-To: freebsd-amd64->freebsd-geom Over to maintainer(s).
Still affects on BETA1. Thu Sep 8 13:58:49 EEST 2011 FreeBSD pp.fss.sunpp 9.0-BETA1 FreeBSD 9.0-BETA1 #0 r224843: Fri Aug 26 16:53:42 EEST 2011 root@pp.fss.sunpp:/usr/obj/usr/src/sys/PDC90 amd64 panic: Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8110a130 stack pointer = 0x28:0xffffff800024db40 frame pointer = 0x28:0xffffff800024db80 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (g_down) Dumping 1080 out of 4079 MB:..2%..11%..21%..32%..42%..51%..61%..72%..82%..91% #6 0xffffffff807ee40d in trap_fatal (frame=0xffffff800024da90, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:800 #7 0xffffffff807ee9d5 in trap (frame=0xffffff800024da90) at /usr/src/sys/amd64/amd64/trap.c:616 #8 0xffffffff807d93bf in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #9 0xffffffff8110a130 in g_cache_read (sc=0xfffffe00097c7200, bp=0xfffffe00c5f75570) at /usr/src/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:117 (kgdb) fr 9 #9 0xffffffff8110a130 in g_cache_read (sc=0xfffffe00097c7200, bp=0xfffffe00c5f75570) at /usr/src/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:117 117 TAILQ_REMOVE(&sc->sc_usedlist, dp, d_used); (kgdb) p sc->sc_usedlist $2 = {tqh_first = 0xfffffe0083a7b980, tqh_last = 0xfffffe0083a7b9b8} (kgdb) p *(struct g_cache_desc *) dp $6 = {d_bno = -2401050962867404578, d_data = 0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>, d_biolist = 0xdeadc0dedeadc0de, d_atime = -2401050962867404578, d_flags = -559038242, d_next = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, d_used = { tqe_next = 0xdeadc0dedeadc0de, tqe_prev = 0xdeadc0dedeadc0de}} (kgdb) p d_used No symbol "d_used" in current context. #10 0xffffffff8110a6f2 in g_cache_start (bp=0xfffffe00c5f75570) at /usr/src/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:378 #11 0xffffffff804c6d43 in g_io_schedule_down (tp=Variable "tp" is not available. ) at /usr/src/sys/geom/geom_io.c:632 #12 0xffffffff804c71d2 in g_down_procbody (arg=Variable "arg" is not available. ) at /usr/src/sys/geom/geom_kern.c:110 #13 0xffffffff804fe685 in fork_exit ( callout=0xffffffff804c7160 <g_down_procbody>, arg=0x0, frame=0xffffff800024dc50) at /usr/src/sys/kern/kern_fork.c:941 #14 0xffffffff807d98ee in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:603 ---- More detailed core.txt available at http://bak.sg01.com/tmp/panic55a.txt
more correct printing (kgdb) p *sc $6 = {sc_geom = 0xfffffe00097bb300, sc_type = 1, sc_bshift = 19, sc_bsize = 524288, sc_tail = 33822343168, sc_mtx = {lock_object = { lo_name = 0xffffffff8110bd5a "GEOM CACHE mutex", lo_flags = 16973824, lo_data = 0, lo_witness = 0xffffff8000651c80}, mtx_lock = 18446741874726241376}, sc_callout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xffffff807d5412c0}}, c_time = 267371554, c_arg = 0xfffffe00097c7200, c_func = 0xffffffff81109950 <g_cache_go>, c_lock = 0xfffffe00097c7220, c_flags = 6, c_cpu = 0}, sc_desclist = {{lh_first = 0xfffffe0083a7b300}, {lh_first = 0xfffffe0102d0f900}, {lh_first = 0xfffffe004c53bc00}, { lh_first = 0xfffffe0083a7b400}, {lh_first = 0xfffffe0017725600}, {lh_first = 0xfffffe00096c2480}, {lh_first = 0xfffffe0009814a00}, { lh_first = 0xfffffe0083a7b080}}, sc_usedlist = {tqh_first = 0xfffffe0083a7b980, tqh_last = 0xfffffe0083a7b9b8}, sc_zone = 0xfffffe000974dc80, sc_maxent = 100, sc_nent = 39, sc_nused = 1, sc_invalid = 0, sc_reads = 300805, sc_readbytes = 2057474048, sc_cachereads = 4918, sc_cachereadbytes = 43359232, sc_cachehits = 3857, sc_cachemisses = 78918, sc_cachefull = 77857, sc_writes = 380543, sc_wrotebytes = 8069953536}
Panics occur when more than one instance of gcache are active. So, having 2 geom caches for /usr and /usr/obj, and making install of ports with WRKDIRPREFIX=/usr/obj/work leads to panic, no matter which size of geom caches used. But single active instance of gcache for /usr/obj seems to show no problems ("inactive" instances are present, but not used to mount filesystems).
Increasing kern.geom.cache.timeout sysctl (from 10 to 30-40 secs) rises the panic risk. Increasing the block size forces cache usage & overfill. Problem mostly appears after number of CacheFull events, cache memory release and filling it again with read operations. So sometimes such situation comes (Entries > Size!): # gcache list usr0 Geom name: usr0 WroteBytes: 2001752576 Writes: 93866 CacheFull: 65 CacheMisses: 1387 CacheHits: 4120 CacheReadBytes: 22902272 CacheReads: 5442 ReadBytes: 27303424 Reads: 6262 InvalidEntries: 0 UsedEntries: 0 Entries: 1025 TailOffset: 21474705408 BlockSize: 131072 Size: 1024 Few more info from the fresh stable version: ----- Architecture: amd64 Architecture Version: 1 Dump Length: 833024B (0 MB) Blocksize: 512 Dumptime: Mon Aug 6 04:54:04 2012 Hostname: cel.home Magic: FreeBSD Text Dump Version String: FreeBSD 9.1-PRERELEASE #1 r239036M: Sat Aug 4 15:38:54 EEST 2012 root@cel.home:/usr/obj/usr/src/sys/PDC90 Panic String: Bad tailq NEXT(0xfffffe0012be7ab8->tqh_last) != NULL Dump Parity: 3722464325 Bounds: 148 Dump Status: good panic: Bad tailq NEXT(0xfffffe0012be7ab8->tqh_last) != NULL cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff802defca = db_trace_self_wrapper+0x2a kdb_backtrace() at 0xffffffff805a52e7 = kdb_backtrace+0x37 panic() at 0xffffffff8056e088 = panic+0x1d8 g_cache_deliver() at 0xffffffff81038b46 = g_cache_deliver+0x276 g_cache_read() at 0xffffffff81038d13 = g_cache_read+0x1a3 g_cache_start() at 0xffffffff8103920e = g_cache_start+0x34e g_io_schedule_down() at 0xffffffff80506c24 = g_io_schedule_down+0x1d4 g_down_procbody() at 0xffffffff805070a2 = g_down_procbody+0x72 fork_exit() at 0xffffffff80540565 = fork_exit+0x135 fork_trampoline() at 0xffffffff807fbeae = fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800027fcf0, rbp = 0 --- KDB: enter: panic db:1:on> run lockinfo db:2:lockinfo> show locks exclusive sleep mutex GEOM CACHE mutex (GEOM CACHE mutex) r = 0 (0xfffffe0012be7a20) locked @ /usr/src/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:262 db:2:locks> show alllocks Process 77823 (perl5.14.2) thread 0xfffffe01b6d288e0 (131440) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0214f6a0f8) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 77822 (perl5.14.2) thread 0xfffffe0136bbf470 (100575) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe00b9a488f0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 77821 (perl5.14.2) thread 0xfffffe0184d158e0 (150562) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe00417a93a0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 77820 (perl5.14.2) thread 0xfffffe0118db5470 (149413) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe006bfa50f8) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 77819 (perl5.14.2) thread 0xfffffe020654f000 (102636) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe00410abe40) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 77818 (perl5.14.2) thread 0xfffffe02066428e0 (127656) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe01817cdb98) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 77817 (perl5.14.2) thread 0xfffffe0012ca1470 (100117) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0172cf8b98) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 6250 (darktable) thread 0xfffffe01816de000 (100720) exclusive sleep mutex pmap (pmap) r = 0 (0xfffffe01810a6a60) locked @ /usr/src/sys/amd64/amd64/pmap.c:3533 exclusive rw pvh global (pvh global) r = 0 (0xffffffff80e4cc00) locked @ /usr/src/sys/amd64/amd64/pmap.c:3532 exclusive sleep mutex vm object (standard object) r = 0 (0xfffffe0181366000) locked @ /usr/src/sys/vm/vm_fault.c:1067 shared sx user map (user map) r = 0 (0xfffffe01810a69a8) locked @ /usr/src/sys/vm/vm_fault.c:823 Process 6250 (darktable) thread 0xfffffe01816de470 (100718) shared sx user map (user map) r = 0 (0xfffffe01810a69a8) locked @ /usr/src/sys/vm/vm_fault.c:823 Process 3201 (mysqld) thread 0xfffffe00bec8e8e0 (133280) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0181d6cb98) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 3201 (mysqld) thread 0xfffffe00410f2000 (100545) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe018119c3a0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 13 (geom) thread 0xfffffe0010177000 (100016) exclusive sleep mutex GEOM CACHE mutex (GEOM CACHE mutex) r = 0 (0xfffffe0012be7a20) locked @ /usr/src/sys/modules/geom/geom_cache/../../../geom/cache/g_cache.c:262 db:2:alllocks> show lockedvnods Locked vnodes db:1:lockedvnods> show pcpu cpuid = 0 dynamic pcpu = 0x538a00 curthread = 0xfffffe0010177000: pid 13 "g_down" curpcb = 0xffffff800027fd00 fpcurthread = none idlethread = 0xfffffe001015d470: tid 100003 "idle: cpu0" curpmap = 0xffffffff80c30190 tssp = 0xffffffff80e37fe0 commontssp = 0xffffffff80e37fe0 rsp0 = 0xffffff800027fd00 gs32p = 0xffffffff80e36138 ldt = 0xffffffff80e36178 tss = 0xffffffff80e36168 spin locks held: db:1:pcpu> bt Tracing pid 13 tid 100016 td 0xfffffe0010177000 kdb_enter() at 0xffffffff805a4fab = kdb_enter+0x3b panic() at 0xffffffff8056e081 = panic+0x1d1 g_cache_deliver() at 0xffffffff81038b46 = g_cache_deliver+0x276 g_cache_read() at 0xffffffff81038d13 = g_cache_read+0x1a3 g_cache_start() at 0xffffffff8103920e = g_cache_start+0x34e g_io_schedule_down() at 0xffffffff80506c24 = g_io_schedule_down+0x1d4 g_down_procbody() at 0xffffffff805070a2 = g_down_procbody+0x72 fork_exit() at 0xffffffff80540565 = fork_exit+0x135 fork_trampoline() at 0xffffffff807fbeae = fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800027fcf0, rbp = 0 --- db:1:bt> -- Using Opera's revolutionary email client: http://www.opera.com/mail/
I have a similar situation but on the 10.0-BETA3 http://docs.freebsd.org/cgi/mid.cgi?20131112112825.0be85aee
Today I had the same panic with 8.4-STABLE and single gcache instance serving disk slice containing the system. Unfortunally, crashdump was not generated but I have screenshot with KDB_TRACE: http://www.grosbein.net/img/panic-gcache.jpg After reboot: # gcache list Geom name: ca0 WroteBytes: 0 Writes: 0 CacheFull: 0 CacheMisses: 20 CacheHits: 88 CacheReadBytes: 324096 CacheReads: 108 ReadBytes: 328192 Reads: 116 InvalidEntries: 0 UsedEntries: 9 Entries: 9 TailOffset: 500107182080 BlockSize: 131072 Size: 1024 Providers: 1. Name: cache/ca0 Mediasize: 500107280896 (465G) Sectorsize: 512 Mode: r0w0e0 Consumers: 1. Name: mirror/gm0s1 Mediasize: 500107281408 (465G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 258048 Mode: r0w0e0
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.