Bug 234576 - hastd exits ungracefully
Summary: hastd exits ungracefully
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2019-01-02 15:48 UTC by rasmus
Modified: 2019-07-08 07:25 UTC (History)
2 users (show)

See Also:


Attachments
patch to hastd to work around this issue (3.77 KB, patch)
2019-01-25 10:40 UTC, Paul Thornton
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description rasmus 2019-01-02 15:48:01 UTC
OS: FreeBSD 12.0-RELEASE

I want to have a poor mans HAST + bhyve solution, running OS and hastd-device on the same physical disk on each server. However hastd dies and produces the following in messages:

Jan  2 16:11:13 vip02 ZFS[9633]: vdev state changed, pool_guid=$12664153310291685811 vdev_guid=$13719543425671811341
Jan  2 16:11:13 vip02 ZFS[9634]: vdev is removed, pool_guid=$12664153310291685811 vdev_guid=$13719543425671811341
Jan  2 16:11:18 vip02 hastd[9563]: [test] (primary) Worker process exited ungracefully (pid=9588, exitcode=71).

If I use the partition (ada0p4) directly, without using HAST, I am able to install FreeBSD as a guest just fine. 

How to reproduce:

- Configure bhyve

https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html

21.7.1 followed to the dot.

- Configure hastd

https://www.freebsd.org/doc/handbook/disks-hast.html

17.14.2 followed to the dot. 

# cat /etc/hast.conf
resource test {
        on vip01 {
                local /dev/ada0p4
                remote 192.168.212.202
        }
        on vip02 {
                local /dev/ada0p4
                remote 192.168.212.201
        }
}
#

- Setup ZFS pool and create volume for guest

# zpool create ztest /dev/hast/test
# zfs create -o volmode=dev ztest/fbsdguestdisk0

- Start installing FreeBSD in the guest

# sh /usr/share/examples/bhyve/vmrun.sh -c 1 -m 1024M -t tap0 -d /dev/zvol/ztest/fbsdguestdisk0 -i -I FreeBSD-12.0-RELEASE-amd64-disc1.iso fbsdguest


- Start to install - all default. The disk can be seen just fine, but then choosing either ZFS or UFS on the disk, will result in hastd quits ungracefully.

In the guest it spews out:

***
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
kern.geom.part.mbr.enforce_chs: 0 -> 0
***

and then stalls (of course, since the device vanished).

According to the documentation:

https://www.freebsd.org/doc/handbook/disks-hast.html

"There is no difference between using HAST-provided devices and raw disks or partitions."

But that seems not to be the case - since it works fine if I skip HAST and configures the zpool directly on ada0p4 on one of the servers.

Is it because I am using HAST on a partition?
Comment 1 Paul Thornton 2019-01-25 09:16:33 UTC
I see exactly the same hastd issue on 12.0-RELEASE-p2, with hast directly on top of the drives (no partitions) - I don't think that specifically is your problem.  HAST seems to be broken in some other way with 12.0

However, my setup is slightly complicated as I have a zpool using GELI devices, running above HAST.  I am currently doing some testing to reduce this to the simplest reproducible setup to remove everything else, and then turn up some debugging.

What I've noted so far is:
1) All of the hastd worker threads die virtually simultaneously.
2) This doesn't appear happen immediately you start writing data, but a very short while afterwards (order of a few seconds).

As a side note for anyone else reading, I had issues making HAST work reliably in my setup under 11 as well, but this was easier to track down and patch.  The high level problem I found there was that ggate_recv received more data than MAXPHYS and the "impossible" condition of ENOMEM happened (line 1264 of primary.c).  After adding some logging here, I "fixed" this by setting gctl_length to MAXPHYS + 0x200 in both primary.c and secondary.c which stopped the problem; this isn't exactly elegant but it worked OK for me.

The issue reported in this bug seems unrelated to that.
Comment 2 Paul Thornton 2019-01-25 10:38:30 UTC
I have reproduced this problem in the simplest form I can with ZFS: A ZFS pool made up of one HAST device, with a disk as the underlying storage.

It appears (as is always the case when you make statements like this) that I was wrong earlier - this problem *is* directly related to the issues I had with 11.x and MAXPHYS being too small - my added debug to hastd/primary.c showed:

hastd: disk2 (primary) hastd 3741 - - [disk2] (primary) G_GATE_CMD_START failed: Cannot allocate memory.  This is in the source as being 'impossible' and appears to have happened.  allocated=20200 length=100000 maxphys=20000

I don't know enough about the internals of GEOM_GATE to know what/why/how this changed between 11.x and 12.0 but I've changed my patch to allow a length of 0x100000 which now makes it work - this is likely not the best fix for the problem, and someone with more knowledge may want to chime in with a better solution.  I've also not done extensive testing so there may be occasions where the problem recurs, but the extra log message remains in the patch so you can immediately see if this is/was a problem.

I'm testing this with bonnie++ in case that makes a material difference (which is how we tested under 11.x)
Comment 3 Paul Thornton 2019-01-25 10:40:06 UTC
Created attachment 201388 [details]
patch to hastd to work around this issue
Comment 4 Paul Thornton 2019-01-26 21:51:21 UTC
Followup on this - after a bit more testing, I now get occasional kernel panics caused by ZFS because the pool is hung, due to the hast vdev.

Despite running the hastd worker process with ktrace attached to it (that made for a large file) there's no obvious clue as to why it hangs.

I'm going to try a few more bits of logging/debugging to try and understand what happens - if I can't see what's going on, I'll revert to 11.2 which works reliably for me.
Comment 5 Michel Le Cocq 2019-07-08 07:25:30 UTC
Hi, I see exactly the same hastd issue on 12.0-RELEASE-p5 and also on 12.0-RELEASE-p7, I tried with hast directly on top of the drives (no partitions) and also on a zfs gpt part.

I use hast to sync SSD ZIL drive of a ZFS pool.

+---------------------------------+
|            disk bay 1           |
+---------------------------------+
      |                      |
+----------+           +----------+
| server A |           | server B |
| ssds ZIL |-sync hast-| ssds ZIL |
|          |           |          |
+----------+           +----------+
      |                      |     
+---------------------------------+
|            disk bay 2           |
+---------------------------------+

So I have 2 Pool raidz3 on 'disk bay 1' and 'disk bay 2'.
Each have it's own Zil cache. 

Server A and B have 4 ssd.

Here is what can see server A when it manage baie1.

[root@server A ~/]# zpool status

        NAME                 STATE     READ WRITE CKSUM
        baie1                ONLINE       0     0     0
          raidz3-0           ONLINE       0     0     0
            multipath/sas0   ONLINE       0     0     0
            [...] 
            multipath/sas11  ONLINE       0     0     0
        logs
          mirror-1           ONLINE       0     0     0
            hast/zil-baie1-0 ONLINE       0     0     0
            hast/zil-baie1-1 ONLINE       0     0     0

[root@server A ~/]# hastctl status
Name    Status   Role           Components
zil-baie1-0      complete primary        /dev/mfisyspd5  serverb.direct
zil-baie1-1      complete primary        /dev/mfisyspd6  serverb.direct
zil-baie2-0      complete secondary      /dev/mfisyspd8  serverb.direct
zil-baie2-1      complete secondary      /dev/mfisyspd9  serverb.direct

Paul Thornton said :

1) All of the hastd worker threads die virtually simultaneously.

In fact not exactly. I loose only the hast that manage the pool that do the 'writing'.
If the second pool have no write, this threads 'second pool one' are still alive and keep my 'second' zil alive.

2) This doesn't appear happen immediately you start writing data, but a very short while afterwards (order of a few seconds).

Yes if you have a look at drive activity with gstat you can see that some writing on ZIL occure then hast crash and ZIL drives disappear.

I my case it only happen when my ZIL is used.

I didn't tried the Patch because I don't wanted to have Kernel Panic, and I can't use 11 RELEASE because I use LACP over a Broadcom 10Gb SFP+ which is not available on 11 RELEASE.