Bug 253787 - Install error "No free space left on device." on Dell T610 w/H700 RAID 10 with 11.1TB
Summary: Install error "No free space left on device." on Dell T610 w/H700 RAID 10 wit...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-23 07:06 UTC by rallenh
Modified: 2021-02-25 02:36 UTC (History)
1 user (show)

See Also:


Attachments
6 x 2.8TB RAID 10 (8.3TB) EFI serial console boot log (7.26 KB, text/plain)
2021-02-23 07:07 UTC, rallenh
no flags Details
8 x 2.8TB RAID 10 (11.1TB) EFI serial console boot log (6.77 KB, text/plain)
2021-02-23 07:08 UTC, rallenh
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description rallenh 2021-02-23 07:06:20 UTC
Hardware:
Dell T610 (in UEFI mode) 64GB RAM (in Advanced ECC mode) with H700 1GB cache w/BBU (Firmware Package Version: 12.10.7-0001) and 8 x 3TB (2.8TB) Dell (Hitachi) near line (7.2K) SAS 6 Gb/s drives.

RAID 10 Volume is empty and initialized. FreeBSD is the only OS.

When I configure all 8 drives in RAID 10 (11,176GB), I get an error during install:
No free space left on device.

Steps:
 1. Boot from 12.2-STABLE CD1 ISO (FreeBSD-12.2-STABLE-amd64-20210211-r369250-disc1.iso)
 2. at Welcome to FreeBSD!: Install (hit Enter)
 3. at Keymap Selection: Select (hit Enter)
 4. at Set Hostname: "local" (type in l-o-c-a-l)
 5. at Distribution Select: OK (hit Enter)
 6. at Partitioning: Auto (UFS) Guided Disk Setup (hit Enter)
 7. at Partition: Entire Disk (hit Enter)
 8. at Partition Scheme: GPT GUID Partition Table (hit Enter)
 9. Error: No free space left on device. <OK> (hit Enter)
10. at Partition Editor: Finish (hit Enter)
11. watch it install and set empty password
12. reboot
13. at loader: "Failed to load kernel 'kernel'

While I am at the loader, I'll check on devices:
Type '?' for a list of commands, 'help' for more detailed help.
OK lsdev
cd devices:
    cd0:    0 blocks (no media)
disk devices:
    disk0:    23437770752 X 512 blocks
      disk0p1: EFI
      disk0p2: FreeBSD UFS
http: (unknown)
net devices:
    net0:
    net1:
OK ls disk0p2:
open 'disk0p2:/' failed: no such file or directory
OK

Notice that there's no swap partition from using the "Auto" partitioning. I can't `ls` the boot disk partition #2, but FreeBSD clearly thinks there's something there it knows about, like a UFS partition.

On the same server, if I change the RAID 10 from using 8 drives to 6 drives FreeBSD installs and reboots without issues (I am using it this way now). With a 6 disk setup, the RAID 10 is 8,382GB in size. I can also "ls" the disk from the loader prompt:
Type '?' for a list of commands, 'help' for more detailed help.
OK lsdev
cd devices:
    cd0:    0 blocks (no media)
disk devices:
    disk0:    17578328064 X 512 blocks
      disk0p1: EFI
      disk0p2: FreeBSD UFS
      disk0p3: FreeBSD swap
http: (unknown)
net devices:
    net0:
    net1:
OK ls disk0p2:
disk0p2:/
 d  .snap
 d  dev
 {...}
    COPYRIGHT
 d  media
    entropy
OK ls disk0p2:/boot/kernel/
disk0p2:/boot/kernel/
    wlan_amrr.ko
    ng_pred1.ko
    {...}
    snd_envy24.ko
    snd_ds1.ko
OK

I was able to find someone else reporting the same issue, albeit on the (12th) generation of Dell PowerEdge's after the T610 (11th gen):
https://forums.freebsd.org/threads/cant-load-kernel-after-fresh-install.40810/

The OP changed the partition size from max (25TB) to 250GB to get FreeBSD to install and reboot successfully.

I'll upload serial console captures of the broken configuration with 11.1TB and from a working configuration with 8.3TB. Both captures are from the same box, the only difference being the working configuration has 6 x 2.8TB drives in RAID 10 (8.3TB total) and the broken configuration is all 8 x 2.8TB drives in RAID 10 (11.1TB total).
Comment 1 rallenh 2021-02-23 07:07:59 UTC
Created attachment 222747 [details]
6 x 2.8TB RAID 10 (8.3TB) EFI serial console boot log
Comment 2 rallenh 2021-02-23 07:08:26 UTC
Created attachment 222748 [details]
8 x 2.8TB RAID 10 (11.1TB) EFI serial console boot log
Comment 3 VVD 2021-02-23 09:42:42 UTC
Can you try install with EFI boot mode off?
Comment 4 rallenh 2021-02-24 00:29:06 UTC
I have the same outcome: Failed to load kernel 'kernel' when I use all 8 spindles (11.1TB). BIOS boot configuration worked fine with 6 x 2.8TB RAID 10 (8.3TB, two hot spares).

The only hardware "configuration" I changed was going from 6 x 2.8TB RAID 10 to 8 x 2.8TB RAID 10 (and re-init) and then re-install and re-boot. I did this for both BIOS boot configuration and UEFI boot configuration with the same results. I have left it in UEFI mode because I now have a USB flash drive loaded with FreeBSD to rescue the box if it doesn't reboot on me. The flash is plugged into an internal connector. It's easier to boot from this in UEFI mode vs BIOS mode and I get a serial console, I just have to hit F11 and choose the flash drive.

I originally thought this was an EFI issue and I had created a ticket with that working assumption, it's here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249045. I was using an 8 x 2.8TB RAID 6 (16TB) configuration then. I didn't really experiment with RAID configurations with the premise that the size of the RAID was the issue.

The reason why I now don't think this EFI related, it does the same thing in BIOS boot mode and I get an error _during_ install with 12.2-STABLE (r369250). In the previous ticket, I used 12.1-RELEASE and I didn't get any errors during install.
Comment 5 VVD 2021-02-24 14:33:04 UTC
As workaround: try to install system and boot from other disk.
It's "best practice" to use for system small disks in RAID1 and big RAID for data only.
And why not ZFS stripe of 4 mirrors (pass-through or JBOD mode HDDs)?
Comment 6 rallenh 2021-02-25 02:36:09 UTC
I have a workaround now: 6 (instead of 8) spindles in RAID 10 with two hot spares for 8.3TB.

Use "best practices" vs taking the defaults from the FreeBSD Installer? Isn't that a best practice of sorts?

It seems "Auto (UFS) Guided Disk Setup" doesn't work in all use cases?

There are SATA drives bigger than 11.1TB these days. This isn't a big RAID setup.

I am not a fan of ZFS. I'll pass, thank you.

AFAIK, the controller I am using doesn't support JBOD, pass-through or a reflash to IT firmware. There are other Dell controllers that do support those features, the H700/H800 do not.

The error I get from bsdinstall (calling autopart) is here:
https://github.com/freebsd/freebsd-src/blob/master/usr.sbin/bsdinstall/partedit/gpart_ops.c#L1064

It looks like geom is valid (there's a few guards) but the pointer to &firstfree could be the issue? Or something in gpart_max_free()?

This commit seems sensible and maybe this is somewhere for me to start looking?
https://github.com/freebsd/freebsd-src/commit/4af559e39b295c6950e25663f638d1737808a205

I wonder if the H700 is doing some funny business with sector sizes when the volume is over a specific size?

Btw, I would say that there's a bug in bsdinstall here:
https://github.com/freebsd/freebsd-src/blob/de1aa3dab23c06fec962a14da3e7b4755c5880cf/usr.sbin/bsdinstall/scripts/auto#L328

bsdinstall autopart || error "Partitioning error" is failing in my case. Perhaps the error isn't making it back to the parent shell?

I should be getting the error "Partitioning error" but I am not. I am getting the error from partedit/autopart.