Bug 268823 - Kerberized NFS mount with "gssname" option does not work
Summary: Kerberized NFS mount with "gssname" option does not work
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Rick Macklem
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-07 23:12 UTC by Rick Macklem
Modified: 2023-01-26 14:52 UTC (History)
1 user (show)

See Also:
rmacklem: mfc-stable13+
rmacklem: mfc-stable12+


Attachments
replace desired_name with GSS_C_NO_NAME so gss_acquire_cred() works (571 bytes, patch)
2023-01-07 23:12 UTC, Rick Macklem
no flags Details | Diff
Increase timeout for upcalls to the gssd(8) daemon (640 bytes, patch)
2023-01-11 16:31 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rick Macklem freebsd_committer freebsd_triage 2023-01-07 23:12:16 UTC
Created attachment 239339 [details]
replace desired_name with GSS_C_NO_NAME so gss_acquire_cred() works

If you attempt a Kerberized NFS mount with the gssname option such as:
# mount -t nfs -o nfsv4,sec=krb5,gssname=host nfs-server:/ /mnt
the gssd daemon gets stuck in the gss_acquire_cred() library call
for several seconds.  It then returns success, but the credentials
are bogus.

A workaround is:
# kinit -k host/nfs-client.domain
# mount -t nfs -o nfsv4,sec=krb5 nfs-server:/ /mnt

The one line patch in the atttachment seems to fix the problem.

I have no idea how long this bug has existed, but I suspect it
has been broken for quite a while, due to some change in the Heimdal
GSSAPI library.
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2023-01-07 23:13:39 UTC
The patch in the attachment has been committed to main.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2023-01-11 16:31:14 UTC
Created attachment 239406 [details]
Increase timeout for upcalls to the gssd(8) daemon

It turns out that the underlying problem that
broke Kerberized NFS mounts using gssname was
a 25sec timeout on the kernerl GSSAPI upcall.

For some reason, gss_acuqire_cred() with a
prinicpal name argument now takes about 28sec
to complete. The upcall would time out. The
kernel code would assume the gssd had died
and, as such, would close the socket. Ironically,
this does cause the gssd daemon to terminate
via a SIGPIPE signal.

This patch increases the timeout. With this patch,
but not the patch in attachment #239339 [details], the mount
works, but takes almost 30sec to complete, so I think
applying both patches is appropriate.

NB: The timeout increase is needed when a user's TGT
    has expired, since gss_init_sec_context() takes
    over 25sec in that case, as well.
Comment 3 commit-hook freebsd_committer freebsd_triage 2023-01-11 21:22:38 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e3c26ce5cb410e4e58e131dfea7054e0bf11e3ca

commit e3c26ce5cb410e4e58e131dfea7054e0bf11e3ca
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-01-11 21:20:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-01-11 21:20:31 +0000

    kgssapi: Increase timeout for kernel to gssd(8) upcalls

    It turns out that the underlying problem that caused
    a Kerberized NFS mount with the "gssname" option to
    fail was that the kernel upcall to the gssd(8) daemon
    would time out prematurely after 25 seconds.  The
    gss_acquire_cred() GSSAPI library call
    takes about 27 seconds for the case where a desired_name
    argument is specified.  A similarly long delay occurs
    when the gss_init_sec_context() call is made and the
    user principal's TGT has expired.

    Once the upcall timed out, the kernel code assumed that
    the gssd(8) daemon had died and closed the socket.
    Ironically, closing the socket did cause the gssd(8)
    daemon to terminate via a SIGPIPE signal.

    This patch increases the timeout to 5 minutes.  Since
    a timeout should only occur when the gssd(8) daemon
    has died, a long timeout should be ok and seems to fix this
    problem.

    I still think that commit c33509d49a should remain in the
    system, since it allows the mount to complete quickly
    and not take nearly 30 seconds.

    PR:     268823
    MFC after:      2 weeks

 sys/kgssapi/gss_impl.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
Comment 4 Peter Eriksson 2023-01-11 22:00:04 UTC
Interesting! 

I recently ran into a similar bug in a Linux client (one of 10 identical clients we are using to measure NFS mount latency and general functionality tests from various places around the university) where a sec=krb5 nfsv4 mount (from a FreeBSD server) would take a long time (test testing system terminated the mount attempt after 5s) instead of the normal sub-second mounts.

And in typical Kerberos-style not much usable error messages was logged anywhere so I ended up having to snoop network traffic and trace sys calls to find stuff out... sigh.

There the problem turned out to be an expired /tmp/krb5cc_0 ticket. On that Linux client this would normally have been managed by the "gssproxy" daemon which maintains and refreshes that ticket, but when diagnosing a problem I had manually done a:

  kinit -k 'HOSTNAME$'

to test and see if the host ticket/principal was valid (I had renamed the client).

This caused the up call from the kernel to gssproxy to take a very long time and it would finally return an invalid/expired ticket down to the kernel which caused timeouts/problems further down the chain.

(The problem was solved by deleting my manually "installed" ticket and let gssproxy handle it :-)
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2023-01-12 02:53:13 UTC
It turned out that the long delay was caused by a
misconfigured DNS. Although I did not have "dns"
in the /etc/nsswitch.conf, I did have a /etc/resolv.conf
on the machine and, because of that, it tried to contact
the DNS server for something like 27 seconds.

Deleting /etc/resolv.conf fixed the delay.

So, the first patch has now been reverted from "main"
and the second patch has been committed to "main" so
that the gssd(8) daemon will not terminate if the
GSSAPI library is slow to return.

With the second patch applied to your kernel, the
mount will succeed if your DNS is misconfigured,
although it may take close to 30 seconds.
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-01-26 03:04:07 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=08b2c77707036768099e7df66222f75da877ebb7

commit 08b2c77707036768099e7df66222f75da877ebb7
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-01-11 21:20:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-01-26 03:02:18 +0000

    kgssapi: Increase timeout for kernel to gssd(8) upcalls

    It turns out that the underlying problem that caused
    a Kerberized NFS mount with the "gssname" option to
    fail was that the kernel upcall to the gssd(8) daemon
    would time out prematurely after 25 seconds.  The
    gss_acquire_cred() GSSAPI library call
    takes about 27 seconds for the case where a desired_name
    argument is specified.  A similarly long delay occurs
    when the gss_init_sec_context() call is made and the
    user principal's TGT has expired.

    Once the upcall timed out, the kernel code assumed that
    the gssd(8) daemon had died and closed the socket.
    Ironically, closing the socket did cause the gssd(8)
    daemon to terminate via a SIGPIPE signal.

    This patch increases the timeout to 5 minutes.  Since
    a timeout should only occur when the gssd(8) daemon
    has died, a long timeout should be ok and seems to fix this
    problem.

    I still think that commit c33509d49a should remain in the
    system, since it allows the mount to complete quickly
    and not take nearly 30 seconds.

    PR:     268823

    (cherry picked from commit e3c26ce5cb410e4e58e131dfea7054e0bf11e3ca)

 sys/kgssapi/gss_impl.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-01-26 03:09:10 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9d35b4b38e1a466371c551568538488b6f873d07

commit 9d35b4b38e1a466371c551568538488b6f873d07
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2023-01-11 21:20:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2023-01-26 03:07:41 +0000

    kgssapi: Increase timeout for kernel to gssd(8) upcalls

    It turns out that the underlying problem that caused
    a Kerberized NFS mount with the "gssname" option to
    fail was that the kernel upcall to the gssd(8) daemon
    would time out prematurely after 25 seconds.  The
    gss_acquire_cred() GSSAPI library call
    takes about 27 seconds for the case where a desired_name
    argument is specified.  A similarly long delay occurs
    when the gss_init_sec_context() call is made and the
    user principal's TGT has expired.

    Once the upcall timed out, the kernel code assumed that
    the gssd(8) daemon had died and closed the socket.
    Ironically, closing the socket did cause the gssd(8)
    daemon to terminate via a SIGPIPE signal.

    This patch increases the timeout to 5 minutes.  Since
    a timeout should only occur when the gssd(8) daemon
    has died, a long timeout should be ok and seems to fix this
    problem.

    I still think that commit c33509d49a should remain in the
    system, since it allows the mount to complete quickly
    and not take nearly 30 seconds.

    PR:     268823

    (cherry picked from commit e3c26ce5cb410e4e58e131dfea7054e0bf11e3ca)

 sys/kgssapi/gss_impl.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2023-01-26 14:52:32 UTC
The second attachment has now been committed
and MFC'd.

Note that the long (30 sec) delay in gss_acquire_cred()
and gss_init_sec_context() was caused by a misconfigured
DNS setup. (Apparently the library tries to use DNS when
a /etc/resolv.conf file exists, even if use of DNS for
host lookup is not specified in /etc/nsswitch.conf.
(This is when this patch is needed to avoid the
 gssd(8) daemon from being terminated.)