Bug 228807 - dump can core
Summary: dump can core
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Warner Losh
URL:
Keywords:
: 228882 (view as bug list)
Depends on:
Blocks: 228911
  Show dependency treegraph
 
Reported: 2018-06-07 15:36 UTC by Diane Bruce
Modified: 2018-07-14 20:14 UTC (History)
6 users (show)

See Also:


Attachments
Simple fix for now to abort dump if fs is too large (997 bytes, patch)
2018-06-11 16:46 UTC, Diane Bruce
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Diane Bruce freebsd_committer 2018-06-07 15:36:12 UTC
(lldb) bt
* thread #1, name = 'dump', stop reason = signal SIGSEGV
  * frame #0: 0x00000000002091f1 dump`flushtape + 1889
    frame #1: 0x000000000020c53a dump`dumpmap + 250
    frame #2: 0x00000000002068db dump`main + 3803
    frame #3: 0x0000000000205095 dump`_start + 149
(lldb) up
frame #1: 0x000000000020c53a dump`dumpmap + 250
dump`dumpmap:
    0x20c53a <+250>: addl   $0x1, %ebx
    0x20c53d <+253>: addq   $0x400, %r14              ; imm = 0x400
    0x20c544 <+260>: cmpl   0x3c2e(%rip), %ebx        ; u_spcl + 160
    0x20c54a <+266>: jl     0x20c530                  ; <+240>
(lldb) up
frame #2: 0x00000000002068db dump`main + 3803
dump`main:
    0x2068db <+3803>: movl   $0x3, 0x978f(%rip)
    0x2068e5 <+3813>: movq   0x9744(%rip), %rsi        ; disk
    0x2068ec <+3820>: xorl   %ebx, %ebx
    0x2068ee <+3822>: movl   $0x200aae, %edi           ; imm = 0x200AAE
(lldb) p sblock
(void *) $0 = 0x00000008006a7000
(lldb) x/x sblock
0x8006a7000: 0x00000000
(lldb)
0x8006a7004: 0x00000000
(lldb)
0x8006a7008: 0x00000018
(lldb)
0x8006a700c: 0x00000020
(lldb)
0x8006a7010: 0x00000028
(lldb)
0x8006a7014: 0x000013c0
Comment 1 Diane Bruce freebsd_committer 2018-06-08 19:35:05 UTC
dump cores only when using a SATA SSD drive, on spinning rust it's fine.
Comment 2 Diane Bruce freebsd_committer 2018-06-11 16:38:00 UTC
> > +   sblock = NULL;                                                          
> >     if ((ret = sbget(diskfd, &sblock, -1)) != 0) {                          
> >             switch (ret) {                                                  
> >             case ENOENT:                                                    
                                                                                
This is all completely bogus as mentioned by @emaste. I know                    
exactly the bug now. Here are the excruciating details. (Well most of it)       
Setting sblock to NULL is not needed here because of the way sbget works.       
Heinsenberg was playing here.                                                   
                                                                                
dump main.c starts up and works out sizes of various things                     
looking at line 454 and 455 We see the following snippet                        
...                                                                             
maxino = sblock->fs_ipg * sblock->fs_ncg;                                       
mapsize = roundup(howmany(maxino, CHAR_BIT), TP_BSIZE);                         
...                                                                             
                                                                                
Looking at my SSD and using printfs in main.c we see    
sblock->fs_ipg= 80256 sblock->fs_ncg = 655 maxino=52567680                      
mapsize = 6571008                                                               
                                                                                
inodes are dumped to "tape" from dumpino() line 459 in traverse.c               
                                                                                
dumpmap is then called line 832 traverse.c                                      
                                                                                
On my SSD spcl.c_count is worked out in and this results in a spcl.c_count=6417 
                                                                                
Eventually flushtape() line 227 in tape.c is called                             
                                                                                
line 296 of tape.c we have                                                      
                                                                                
...                                                                             
        if (spcl.c_type != TS_END) {                                            
                for (i = 0; i < spcl.c_count; i++)                              
                        if (spcl.c_addr[i] != 0)                                
                                blks++;                                         
        }                                                                       
...                                                                             
spcl.c_count of 6147 is quite a bit out of bounds for spcl.c_addr               
and it segfaults here.                                                          
                                                                                
In fact, looking at /usr/include/protocols/dumprestore.h                        
                                                                                
line 117                                                                        
...                                                                             
                                                                                
                int32_t c_count;            /* number of valid c_addr entries */
                char    c_addr[TP_NINDIR];  /* 1 => data; 0 => hole in inode */ 
                char    c_label[LBLSIZE];   /* dump label */                    
...                                                                             
                                                                                
Looking at TP_NINDIR comment at line 53 of dumprestore.h                        
                                                                                
...                                                                             
                                                                                
* TP_NINDIR is the number of indirect pointers in a TS_INODE                    
* or TS_ADDR record. Note that it must be a power of two.                       

 */                                                                             
#define TP_BSIZE        1024                                                    
#define NTREC           10                                                      
#define HIGHDENSITYTREC 32                                                      
#define TP_NINDIR       (TP_BSIZE/2)                                            
                                                                                
...                                                                             
                                                                                
Hence c_addr[TP_NINDIR]; is 512 bytes. We overflow this fairly easily.          
                                                                                
&spcl = 0x2100d8                                                                
  top = 0x2104d8                                                                
                                                                                
&pcl.c_addr[3715]=0x210fff                                                      
                                                                                
(lldb) x/x 0x210ffe                                                             
0x00210ffe: 0x00000000                                                          
warning: Not all bytes (2/4) were able to be read from 0x210ffe.                
                                                                                
Boom.                                                                           
                                                                                
Apparently when the sblock buffer was static, the linked arrangment             
of sblock_buf[MAXBSIZE] and spcl from dumprestore.h was "just right"            
to mask this bounds error. This would also explain why malloc'ing               
a huge area for the superblock made no difference.                              
                                                                                
I don't know how to fix this. You will note that a dump on my spinning          
rust system will also go out of bounds in c_addr but perhaps the                
following variables are not needed at this point? e.g. c_label and up.          
It's also overflowing spcl itself and going into memory it shouldn't.           
I'd not trust the dump itself at this point.                                    
...                                                                             
                                                                                
sblock->fs_ipg= 80256 sblock->fs_ncg = 319 maxino=25601664                      
mapsize = 3201024                                                               
                                                                                
...                                                                             
spcl=0x2100d8                                                                   
top= 0x2104d8            
spcl.c_count=3126                                                               
                                                                                
spcl.c_addr[3125]=0 blks=266                                                    
                                                                                
 spcl.c_addr[3125]=0 blks=267                                                   
&spcl.c_addr[3125]=0x210db1                                                     
                                                                                
^^ it doesn't segfault here because it doesn't get to a page boundary           
but it certainly is outside the bounds of spcl.                                 
                                                                                
This was fun!
Comment 3 Diane Bruce freebsd_committer 2018-06-11 16:46:21 UTC
Created attachment 194171 [details]
Simple fix for now to abort dump if fs is too large

I have a more complex diff that I am working on which will go on phabricator if Kirk (or someone) says I am on the right track.
Comment 4 commit-hook freebsd_committer 2018-06-11 19:13:07 UTC
A commit references this bug:

Author: db
Date: Mon Jun 11 19:12:51 UTC 2018
New revision: 334968
URL: https://svnweb.freebsd.org/changeset/base/334968

Log:
  Large file systems with inodes > 512K have been silently overflowing
  c_addr in spcl. So check before we start dumping otherwise we can
  end up with a corrupted dump.

  PR:		228807
  Submitted by:	db
  Reviewed by:	imp
  Approved by:	imp

Changes:
  head/sbin/dump/main.c
Comment 5 commit-hook freebsd_committer 2018-06-11 19:33:25 UTC
A commit references this bug:

Author: imp
Date: Mon Jun 11 19:32:36 UTC 2018
New revision: 334969
URL: https://svnweb.freebsd.org/changeset/base/334969

Log:
  Add asserts to prevent overflows of c_addr.

  Add some asserts that prevents the overflows of c_addr. This can't
  happen, absent bugs. However, certain large filesystems can cause
  problems. These have been prevented by r334968, but a solution
  is needed. These asserts will help assure that solution is correct.

  PR: 228807
  Reviewed by: db

Changes:
  head/sbin/dump/tape.c
  head/sbin/dump/traverse.c
Comment 6 Rodney W. Grimes freebsd_committer 2018-06-12 08:36:54 UTC
*** Bug 228882 has been marked as a duplicate of this bug. ***
Comment 7 Rodney W. Grimes freebsd_committer 2018-06-12 08:37:53 UTC
Promoting this to the Release 12 meta bug
Comment 8 Ed Maste freebsd_committer 2018-06-19 14:23:09 UTC
Is this now fixed?
Comment 9 Warner Losh freebsd_committer 2018-06-19 14:24:41 UTC
It's fixed in head. I don't think it's been MFC'd.
Comment 10 Rodney W. Grimes freebsd_committer 2018-06-19 14:59:00 UTC
(In reply to Warner Losh from comment #9)
There is some other change that causes this to be a  bug, I am not sure that change is merged to stable/11 either.  I can build and do a test to see if this bug is in stable/11, or we can try to bisect head and find the trigger commit that made this a "bug" and make sure we merge both changes.

I'll be back home tonight where resources are more plentiful if we need to track this down.
Comment 11 Diane Bruce freebsd_committer 2018-06-19 15:35:04 UTC
(In reply to Warner Losh from comment #9)

My understanding was Kirk was going to look at it when he got back from his trip. 
dump works as is on 11.1 although it's silently overflowing c_addr like crazy for some people. It's not my call.
Comment 12 Rodney W. Grimes freebsd_committer 2018-06-19 17:15:11 UTC
(In reply to Diane Bruce from comment #11)
How do we know that it is "silently overflowing c_addr" in 11.1 if it is silent?
Also if there is a problem in 11.1 we would need to issue an errata to fix it.
And if that problem still exists in 11.2 it needs an errata too.

I guess when I get home tonight I can start to look at issues in 11/11.1/11.2.

Warner, do we know the commit in head that added the other use of c_addr?
Comment 13 Ed Maste freebsd_committer 2018-06-19 17:22:00 UTC
(In reply to Warner Losh from comment #9)
Thanks Warner - so as far as tracking this for FreeBSD 12 (PR228911) we're done.
Comment 14 Kirk McKusick freebsd_committer 2018-07-14 20:14:22 UTC
Fixed in sbin/dump/tape.c:

r334979 | imp | 2018-06-11 13:38:26 -0700 (Mon, 11 Jun 2018) | 10 lines

Fix a bug in the counting of blks.

We shouldn't count the bytes set in c_addr for TS_CLRI and TS_BITS
nodes. Those block overload c_count to communicate how many blocks
follow, not now many c_addr spaces are used. Dump would dump core
(now) because memory layout moved around and we'd access elements past
the end to make a count.

Reviewed by: kib@