Summary: | geom(4): raid/md_jmicron.c: jmicron RAID taste code can panic if conf is garbage | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Robert Morris <rtm> | ||||
Component: | kern | Assignee: | freebsd-geom (Nobody) <geom> | ||||
Status: | Open --- | ||||||
Severity: | Affects Some People | CC: | emaste, fs, markj, mav | ||||
Priority: | --- | Keywords: | crash, needs-qa | ||||
Version: | Unspecified | Flags: | koobs:
maintainer-feedback?
(markj) koobs: mfc-stable13? koobs: mfc-stable12? |
||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Not sure what to do about this - we can easily add a check for this specific case but should really review all of this for robustness. Adding mav for commentary; any sense of how commonly jmicron RAID metadata is used? I believe software RAIDs are generally not widely used nowadays with ZFS used more and more often. And JMicron is obviously not the leader of the market, comparing to Intel and AMD, or even Marvell SATA RAIDs (the last work on level of firmware, but take bigger part of the SATA market). So if somebody make a nice workaround for the specific case -- I'd take a look, otherwise I am not sure what can we do about it, I have bigger fish to fry to look deep on this. ^Triage: Mark may have ideas given recent src 9e9ba9c73de9 ^Triage: Uh, base 9e9ba9c73de9 Looks like there are several problems here. First, the tasting code assumes that disk IDs are non-zero, since jmicron_meta_total_disks() terminates its search when an array entry is zero. disk_ids seem to be assigned using arc4random(), so most of the time they'll be non-zero. :) jmicron_meta_find_disk() skips over zeroed entries though, so it's not consistent with total_disks(). I see another bug in the jmicron taste routine, in this line: spare = (disk_pos == -2) ? 1 : 0; It's impossible for disk_pos to equal -2 here, I suspect it should be -3. I'm not sure how best to fix all of this. Something like this fixes the test case, but I'm sure it's incomplete, and could potentially break existing setups, though I suspect that's unlikely: diff --git a/sys/geom/raid/md_jmicron.c b/sys/geom/raid/md_jmicron.c index 939e05f78017..faa7b1cbb40e 100644 --- a/sys/geom/raid/md_jmicron.c +++ b/sys/geom/raid/md_jmicron.c @@ -249,6 +249,8 @@ jmicron_meta_find_disk(struct jmicron_raid_conf *meta, uint32_t id) int pos; id &= JMICRON_DISK_MASK; + if (id == 0) + return (-1); for (pos = 0; pos < JMICRON_MAX_DISKS; pos++) { if ((meta->disks[pos] & JMICRON_DISK_MASK) == id) return (pos); |
Created attachment 233800 [details] disk image that causes jmicron taste code to panic During tasting, if the last sector of a newly attached drive looke enough like a jmicron_raid_conf but meta->disks[] contains no disks, and meta->spare[] contains one disk, g_raid_md_jmicron_new_disk() may call g_raid_md_jmicron_start() anyway, leading to a panic. I've attached a demo disk image, which causes this code in g_raid_md_jmicron_new_disk() to execute with disks_present = 1 (the disk being tasted), total_disks = zero (from meta->disks[]), and total_spare() = one (from meta->spare[]): /* If we collected all needed disks - start array. */ if (mdi->mdio_disks_present == mdi->mdio_total_disks + jmicron_meta_total_spare(mdi->mdio_meta)) g_raid_md_jmicron_start(sc); Mounting the demo disk image: # uname -a FreeBSD 14.0-CURRENT FreeBSD 14.0-CURRENT #219 main-n250919-29f81bc20825-dirty: Sat May 7 16:30:27 EDT 2022 rtm@xxx:/usr/obj/usr/rtm/symbsd/src/riscv.riscv64/sys/RTM riscv # mdconfig -f taste8a.img GEOM_RAID: JMicron-0000000: Array JMicron-0000000 created. GEOM_RAID: JMicron-0000000: No transformation module found for ��. GEOM_RAID: JMicron-0000000: Volume �� state changed from STARTING to UNSUPPORTED panic: No disk at position 0! cpuid = 0 time = 1651920087 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 kdb_backtrace() at kdb_backtrace+0x2c vpanic() at vpanic+0x16e panic() at panic+0x2a g_raid_md_jmicron_start_disk() at g_raid_md_jmicron_start_disk+0x350 g_raid_md_jmicron_start() at g_raid_md_jmicron_start+0x1c2 g_raid_md_jmicron_new_disk() at g_raid_md_jmicron_new_disk+0x110 g_raid_md_taste_jmicron() at g_raid_md_taste_jmicron+0x39a G_RAID_MD_TASTE() at G_RAID_MD_TASTE+0x5a g_raid_taste() at g_raid_taste+0x15c g_new_provider_event() at g_new_provider_event+0xb8 one_event() at one_event+0x106 g_run_events() at g_run_events+0x8a g_event_procbody() at g_event_procbody+0x56 fork_exit() at fork_exit+0x80 fork_trampoline() at fork_trampoline+0xa KDB: enter: panic [ thread pid 13 tid 100017 ] Stopped at breakpoint+0xa: c.ldsp s0,0(sp) db>