Bug 263853

Summary:

geom(4): raid/md_jmicron.c: jmicron RAID taste code can panic if conf is garbage

Product:

Base System

Reporter:

Robert Morris <rtm>

Component:

kern

Assignee:

freebsd-geom (Nobody) <geom>

Status:

Open ---

Severity:

Affects Some People

CC:

emaste, fs, markj, mav

Priority:

---

Keywords:

crash, needs-qa

Version:

Unspecified

Flags:

koobs: maintainer-feedback? (markj)
koobs: mfc-stable13?
koobs: mfc-stable12?

Hardware:

Any

OS:

Any

Attachments:

Description	Flags
disk image that causes jmicron taste code to panic	none

Description Robert Morris 2022-05-07 20:55:59 UTC

Created attachment 233800 [details]
disk image that causes jmicron taste code to panic

During tasting, if the last sector of a newly attached drive looke
enough like a jmicron_raid_conf but meta->disks[] contains no disks,
and meta->spare[] contains one disk, g_raid_md_jmicron_new_disk() may
call g_raid_md_jmicron_start() anyway, leading to a panic.

I've attached a demo disk image, which causes this code in
g_raid_md_jmicron_new_disk() to execute with disks_present = 1 (the
disk being tasted), total_disks = zero (from meta->disks[]), and
total_spare() = one (from meta->spare[]):

          /* If we collected all needed disks - start array. */
          if (mdi->mdio_disks_present == mdi->mdio_total_disks +
              jmicron_meta_total_spare(mdi->mdio_meta))
            g_raid_md_jmicron_start(sc);

Mounting the demo disk image:

# uname -a
FreeBSD  14.0-CURRENT FreeBSD 14.0-CURRENT #219 main-n250919-29f81bc20825-dirty: Sat May  7 16:30:27 EDT 2022     rtm@xxx:/usr/obj/usr/rtm/symbsd/src/riscv.riscv64/sys/RTM  riscv
# mdconfig -f taste8a.img
GEOM_RAID: JMicron-0000000: Array JMicron-0000000 created.
GEOM_RAID: JMicron-0000000: No transformation module found for ï¿½ï¿½.
GEOM_RAID: JMicron-0000000: Volume ï¿½ï¿½ state changed from STARTING to UNSUPPORTED                                                                            
panic: No disk at position 0!
cpuid = 0
time = 1651920087
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x16e
panic() at panic+0x2a
g_raid_md_jmicron_start_disk() at g_raid_md_jmicron_start_disk+0x350
g_raid_md_jmicron_start() at g_raid_md_jmicron_start+0x1c2
g_raid_md_jmicron_new_disk() at g_raid_md_jmicron_new_disk+0x110
g_raid_md_taste_jmicron() at g_raid_md_taste_jmicron+0x39a
G_RAID_MD_TASTE() at G_RAID_MD_TASTE+0x5a
g_raid_taste() at g_raid_taste+0x15c
g_new_provider_event() at g_new_provider_event+0xb8
one_event() at one_event+0x106
g_run_events() at g_run_events+0x8a
g_event_procbody() at g_event_procbody+0x56
fork_exit() at fork_exit+0x80
fork_trampoline() at fork_trampoline+0xa
KDB: enter: panic
[ thread pid 13 tid 100017 ]
Stopped at      breakpoint+0xa: c.ldsp  s0,0(sp)
db>

Comment 1 Ed Maste freebsd_committer

2022-05-09 16:45:43 UTC

Not sure what to do about this - we can easily add a check for this specific case but should really review all of this for robustness. Adding mav for commentary; any sense of how commonly jmicron RAID metadata is used?

Comment 2 Alexander Motin freebsd_committer

2022-05-09 17:04:17 UTC

I believe software RAIDs are generally not widely used nowadays with ZFS used more and more often.  And JMicron is obviously not the leader of the market, comparing to Intel and AMD, or even Marvell SATA RAIDs (the last work on level of firmware, but take bigger part of the SATA market).  So if somebody make a nice workaround for the specific case -- I'd take a look, otherwise I am not sure what can we do about it, I have bigger fish to fry to look deep on this.

Comment 3 Kubilay Kocak freebsd_committer

2022-05-15 02:23:46 UTC

^Triage: Mark may have ideas given recent src 9e9ba9c73de9

Comment 4 Kubilay Kocak freebsd_committer

2022-05-15 02:24:07 UTC

^Triage: Uh, base 9e9ba9c73de9

Comment 5 Mark Johnston freebsd_committer

2022-05-19 15:07:02 UTC

Looks like there are several problems here.  First, the tasting code assumes that disk IDs are non-zero, since jmicron_meta_total_disks() terminates its search when an array entry is zero.  disk_ids seem to be assigned using arc4random(), so most of the time they'll be non-zero. :)

jmicron_meta_find_disk() skips over zeroed entries though, so it's not consistent with total_disks().

I see another bug in the jmicron taste routine, in this line:

  spare = (disk_pos == -2) ? 1 : 0;

It's impossible for disk_pos to equal -2 here, I suspect it should be -3.

I'm not sure how best to fix all of this.  Something like this fixes the test case, but I'm sure it's incomplete, and could potentially break existing setups, though I suspect that's unlikely:

diff --git a/sys/geom/raid/md_jmicron.c b/sys/geom/raid/md_jmicron.c
index 939e05f78017..faa7b1cbb40e 100644
--- a/sys/geom/raid/md_jmicron.c
+++ b/sys/geom/raid/md_jmicron.c
@@ -249,6 +249,8 @@ jmicron_meta_find_disk(struct jmicron_raid_conf *meta, uint32_t id)
        int pos;
 
        id &= JMICRON_DISK_MASK;
+       if (id == 0)
+               return (-1);
        for (pos = 0; pos < JMICRON_MAX_DISKS; pos++) {
                if ((meta->disks[pos] & JMICRON_DISK_MASK) == id)
                        return (pos);