Bug 235110 - 13.0-CURRENT drops to debugger on shutdown with IPNAT enabled.
Summary: 13.0-CURRENT drops to debugger on shutdown with IPNAT enabled.
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Cy Schubert
URL:
Keywords:
Depends on: 191343
Blocks:
  Show dependency treegraph
 
Reported: 2019-01-21 15:23 UTC by David.Boyd49
Modified: 2024-01-10 02:51 UTC (History)
4 users (show)

See Also:


Attachments
screenshot of debugger output (20.40 KB, image/png)
2019-01-21 15:23 UTC, David.Boyd49
no flags Details
screenshot of backtrace (16.57 KB, image/png)
2019-01-21 15:24 UTC, David.Boyd49
no flags Details
same deal (213.81 KB, image/jpeg)
2019-01-24 21:02 UTC, waitman
no flags Details
uname -a output (83 bytes, text/plain)
2019-01-26 17:17 UTC, David.Boyd49
no flags Details
kldstat output (736 bytes, text/plain)
2019-01-26 17:26 UTC, David.Boyd49
no flags Details
ifconfig -a output (1.11 KB, text/plain)
2019-01-26 17:27 UTC, David.Boyd49
no flags Details
ipf.rules (846 bytes, text/plain)
2019-01-26 17:27 UTC, David.Boyd49
no flags Details
ipnat.rules (555 bytes, text/plain)
2019-01-26 17:28 UTC, David.Boyd49
no flags Details
dump output (102.56 KB, text/plain)
2019-01-26 17:28 UTC, David.Boyd49
no flags Details
excerpt from /etc/rc.conf.local with _flags (281 bytes, text/plain)
2019-01-26 17:29 UTC, David.Boyd49
no flags Details
kgdb command output (13.06 KB, text/plain)
2019-01-27 18:29 UTC, David.Boyd49
no flags Details
Patch for PR 235110 (505 bytes, patch)
2019-01-28 03:52 UTC, Cy Schubert
no flags Details | Diff
dump output after patch was applied (96.34 KB, text/plain)
2019-01-29 17:30 UTC, David.Boyd49
no flags Details
Fix for second panic (667 bytes, patch)
2019-01-29 21:03 UTC, Cy Schubert
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David.Boyd49 2019-01-21 15:23:37 UTC
Created attachment 201308 [details]
screenshot of debugger output

13.0-CURRENT drops to debugger on shutdown with IPNAT enabled.

Running in VirtualBox 6.0.2.

The identical configuration running 12.0-RELEASE-p2, 12.0-STABLE, 11.2-
RELEASE-p8 and 11.2-STABLE do not exhibit this behavior.

This (VM) is a test machine and I can do anything that will help
identify the root of this problem.

I have included screenshots of the debug output and a backtrace.


Thanks.

David Boyd.
Comment 1 David.Boyd49 2019-01-21 15:24:18 UTC
Created attachment 201309 [details]
screenshot of backtrace
Comment 2 waitman 2019-01-24 21:02:24 UTC
Created attachment 201380 [details]
same deal

I've also been having this problem, pretty much since I first pulled the source for 13.0-CURRENT, moving from 12.0-CURRENT.
Comment 3 Cy Schubert freebsd_committer freebsd_triage 2019-01-25 02:29:59 UTC
(In reply to waitman from comment #2)

This panic is totally unrelated and probably suggests that your panics are not ipfilter related but some other deeper cause.
Comment 4 Cy Schubert freebsd_committer freebsd_triage 2019-01-25 02:58:25 UTC
I'll only look at the ipfilter issue. The waitman@waitman.net issue needs to have its own PR.

Some questions:

1. What is the host that VirtualBox is running on? Windows? Solaris?
1a. Are VirtualBox additions installed on the VM?

2. uname -a please.

3. kldstat output.

4. ifconfig -a output.

5. a listing of your ipf.conf and ipnat.conf rules.

6. is ippool in use? If yes, ippool.conf please.

7. It would help a lot if you could get a dump.
Comment 5 Cy Schubert freebsd_committer freebsd_triage 2019-01-25 03:27:59 UTC
More questions (still need the first seven answered though).

8. Which ipfilter services are enabled? ipf, ipnat, ipfs? (ipfilter isn't called during shutdown except when ipfs is enabled.)

9. What flags are enabled for each service? (Specifically, is -p specified for ipnat? But others too.)
Comment 6 David.Boyd49 2019-01-26 17:17:12 UTC
Created attachment 201416 [details]
uname -a output
Comment 7 David.Boyd49 2019-01-26 17:25:11 UTC
1. Host is CentOS EL7 (7.6.1810)
1a.  VirtualBox Guest Additions are installed in the VM.

2. See attachment.  (uname -a output)

3. See attachment.  (kldstat output)

4. See attachment.  (ifconfig output)

5. See (2) attachments.  (ipf.rules and ipnat.rules)

6. ippool is not in use.

7. See attachement.  (core.txt.0)

8. ipf, ipfs, ipmon, ipnat.

9. See attachment.  (ipf-ipfs-ipmon-ipnat excerpt from /etc/rc.conf.local)
Comment 8 David.Boyd49 2019-01-26 17:26:32 UTC
Created attachment 201417 [details]
kldstat output
Comment 9 David.Boyd49 2019-01-26 17:27:07 UTC
Created attachment 201418 [details]
ifconfig -a output
Comment 10 David.Boyd49 2019-01-26 17:27:54 UTC
Created attachment 201419 [details]
ipf.rules
Comment 11 David.Boyd49 2019-01-26 17:28:15 UTC
Created attachment 201420 [details]
ipnat.rules
Comment 12 David.Boyd49 2019-01-26 17:28:55 UTC
Created attachment 201421 [details]
dump output
Comment 13 David.Boyd49 2019-01-26 17:29:45 UTC
Created attachment 201422 [details]
excerpt from /etc/rc.conf.local with _flags
Comment 14 Cy Schubert freebsd_committer freebsd_triage 2019-01-26 18:33:21 UTC
As suspected, ipfs is involved.

It would help if I could get a copy of the dump itself but we'll try you being my hands and eyes first.

Considering your output you have posted, devel/gdb is installed or you're using the deprecated copy in base.

Go into kgdb as you did before and enter:

frame 17

p ipn     <-- This should not be NULL as it's tested at line 1822 above.

p ipn->ipn_ipnat.in_size

p &ipn->ipn_ipnat

p ipn->ipn_ipnat

Just out of curiosity, 

p nat->nat_ptr

Else you might need to put the dump file somewhere (not here) so I can fetch it. I'll let you know if we need to do that, but for now as the outputs above should hopefully give us the first hint of what might be happening.
Comment 15 David.Boyd49 2019-01-27 01:50:06 UTC
All I get from "frame 17" is "no such command; use "help" to list available commands.

Sorry, if I gave the impression that I know what I'm doing ... not so much.
Comment 16 Cy Schubert freebsd_committer freebsd_triage 2019-01-27 02:19:37 UTC
No worries.

su to root, su - or sudo -i. Either works.

cd /var/crash

If devel/gdb is installed: kgdb /boot/kernel/kernel vmcore.last

If devel/gdb is not installed use /usr/libexec/kgdb instead. Then enter the frame command and the rest of the kgdb commands. To save time cutting and pasting here, run script(1) first. Then upload the file called typescript into this PR.

If it would be easier, do you have a site you can put vmcore.0 (0 could be any number)? I can download the vmcore file and build r343372 here (because it contains the correct offsets for the debugging symbols).
Comment 17 David.Boyd49 2019-01-27 18:29:13 UTC
Created attachment 201456 [details]
kgdb command output

Cy,

Attached is the output of the dump commands.

Hope this helps.

I'll do whatever I can to help with this issue.

David.
Comment 18 Cy Schubert freebsd_committer freebsd_triage 2019-01-28 03:52:27 UTC
Created attachment 201468 [details]
Patch for PR 235110

Try the attached patch.
Comment 19 David.Boyd49 2019-01-29 17:30:16 UTC
Created attachment 201511 [details]
dump output after patch was applied

The patch was applied but shutdown resulted in another crash.

Let me know what you want me to do next.

Thanks.

David.
Comment 20 Cy Schubert freebsd_committer freebsd_triage 2019-01-29 20:40:11 UTC
Looks like the patch fixed one panic and we hit another. ipf_nat_getent() is trying to obtain a lock it already obtained earlier. I'll send you a patch for this later.
Comment 21 Cy Schubert freebsd_committer freebsd_triage 2019-01-29 21:03:34 UTC
Created attachment 201516 [details]
Fix for second panic

Can you please also apply this patch? Use the other patch too. This fixes the subsequent witness panic.
Comment 22 David.Boyd49 2019-01-30 15:14:39 UTC
Cy,

That seems to have fixed this problem.

Thanks.

David


Originally, I was attempting to check to see whether, or not, 13.0-CURRENT still 

exhibited the symptoms we once worked on in PR 191343 which was closed last year 

after 4 years of effort.

It does (sort of) with a slightly different error message.

Should I open a new PR, e-email current@freebsd.org or something else.

What do you think.

Thanks, again.

David.
Comment 23 Cy Schubert freebsd_committer freebsd_triage 2019-01-30 20:08:58 UTC
That's good to hear.

Just reopen the old PR and attach the new messages. Did you try the fix I posted? The PR was closed because you never replied.
Comment 24 commit-hook freebsd_committer freebsd_triage 2019-01-30 20:23:19 UTC
A commit references this bug:

Author: cy
Date: Wed Jan 30 20:22:34 UTC 2019
New revision: 343590
URL: https://svnweb.freebsd.org/changeset/base/343590

Log:
  When copying a NAT rule struct to userland for save by ipfs, use the
  length of the struct in memmove() rather than an unintialized variable.
  This fixes the first of two kernel page faults when ipfs is invoked.

  PR:		235110
  Reported by:	David.Boyd49@twc.com
  MFC after:	2 weeks

Changes:
  head/sys/contrib/ipfilter/netinet/ip_nat.c
Comment 25 commit-hook freebsd_committer freebsd_triage 2019-01-30 20:24:22 UTC
A commit references this bug:

Author: cy
Date: Wed Jan 30 20:23:16 UTC 2019
New revision: 343591
URL: https://svnweb.freebsd.org/changeset/base/343591

Log:
  Do not obtain an already held read lock. This causes a witness panic when
  ipfs is invoked. This is the second of two panics resolving PR 235110.

  PR:		235110
  Reported by:	David.Boyd49@twc.com
  MFC after:	2 weeks

Changes:
  head/sys/contrib/ipfilter/netinet/ip_nat.c
Comment 26 Cy Schubert freebsd_committer freebsd_triage 2019-02-09 01:42:08 UTC
This fix is incorrect. PR 191343 will require a complete assessment of the ipfs feature.
Comment 27 commit-hook freebsd_committer freebsd_triage 2019-02-14 00:52:24 UTC
A commit references this bug:

Author: cy
Date: Thu Feb 14 00:52:04 UTC 2019
New revision: 344113
URL: https://svnweb.freebsd.org/changeset/base/344113

Log:
  MFC r343591:

  Do not obtain an already held read lock. This causes a witness panic when
  ipfs is invoked. This is the second of two panics resolving PR 235110.

  PR:		235110
  Reported by:	David.Boyd49@twc.com

Changes:
_U  stable/10/
  stable/10/sys/contrib/ipfilter/netinet/ip_nat.c
_U  stable/11/
  stable/11/sys/contrib/ipfilter/netinet/ip_nat.c
_U  stable/12/
  stable/12/sys/contrib/ipfilter/netinet/ip_nat.c
Comment 28 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 02:51:34 UTC
^Triage: committed back in 2019.