Bug 226064

Summary: Observing panic at ctl_free_beio+0x80 while running traffic on 100 iSCSI LUNs
Product: Base System Reporter: Manish Kumar <manishk>
Component: kernAssignee: Edward Tomasz Napierala <trasz>
Status: New ---    
Severity: Affects Only Me CC: manishk, mav, trasz
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
core text file
none
panic dmesg after enabling iscsi.debug=10 none

Description Manish Kumar 2018-02-20 12:08:18 UTC
Created attachment 190824 [details]
core text file

Observing panic while running traffic on 100 iSCSI LUNs (used zvols as LUNs) on the target machine. The panic stack trace:  

panic: Memory modified after free 0xfffff8016fadcb40(32) val=0 @ 0xfffff8016fadcb40

cpuid = 0
time = 1518462716
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe0065865370
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe0065865440
vpanic() at vpanic+0x268/frame 0xfffffe0065865510
kproc_shutdown() at kproc_shutdown/frame 0xfffffe0065865570
trash_ctor() at trash_ctor+0x6b/frame 0xfffffe00658655b0
uma_zalloc_arg() at uma_zalloc_arg+0x284/frame 0xfffffe00658656f0
bucket_alloc() at bucket_alloc+0x102/frame 0xfffffe0065865730
uma_zfree_arg() at uma_zfree_arg+0x647/frame 0xfffffe0065865820
uma_zfree() at uma_zfree+0x29/frame 0xfffffe0065865850
ctl_free_beio() at ctl_free_beio+0x80/frame 0xfffffe0065865880
ctl_complete_beio() at ctl_complete_beio+0x48/frame 0xfffffe00658658a0
ctl_be_block_dispatch_zvol() at ctl_be_block_dispatch_zvol+0x664/frame 0xfffffe00658659b0
ctl_be_block_worker() at ctl_be_block_worker+0x1fb/frame 0xfffffe0065865a40
taskqueue_run_locked() at taskqueue_run_locked+0x246/frame 0xfffffe0065865ab0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0065865b30
fork_exit() at fork_exit+0x145/frame 0xfffffe0065865bb0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0065865bb0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic


-------------------------------------
Steps to recreate the issue:
-------------------------------------
The target machine exposes 100 zvols as iSCSI LUNs. The Initiator discovers these LUNS, creates file system on them, and runs traffic (iozone). The target machine panics after few hours.

OS: FreeBSD 12.0-CURRENT (svn r329019)
I'm using GENERIC config file, with "nooptions       VIMAGE"

==>Start the Target (FreeBSD)
    1. Create 100 zvols
      # zpool create iscsi <slice/partition>
      # for ((i=0;i<100;i++)); do zfs create -V1G -o volmode=dev iscsi/d$i; done
    
    2. configure the network interface to be used (Target_IP_Address)

    2. /etc/ctl.conf file:
         portal-group pg0 {
            discovery-auth-group no-authentication
            listen <Target_IP_Address>
         }
         target iqn.2016-11.com.xyz.abc:0 {
            auth-group no-authentication
            portal-group pg0
                     lun 0 {
                              path /dev/zvol/iscsi/d0
                              size 1G
                              option vendor "foo"
                              option product "bar"
                              option revision "d0"
                     }
         }
         .
         .
      
        100 targets with 1 lun each

    3. start the target daemon 
        # service ctld onestart

==>Initiator (Linux)
    1. login/discover the target 
        # iscsiadm -m discovery -t st -p <Target_IP_Address> -l
    2. List all Luns 
	# lsscsi	
    3. Format and create file system ONLY on all discovered luns (TAKE CARE TO NOT TOUCH OTHER LUNS). 
	# mkfs.ext3 /dev/sdxx      // for all discovered LUNs
    4. mount all the devices formatted in the above step to different mount points
        # mount /dev/sdxx /mnt/iscsiyy    // for all the luns formatted in the above step
    5. Run traffic on all luns using iozone
        # cd /mnt/iscsiyy
	# iozone -a -I -+d -g 512m &
	

==> Target machine panics after 3 to 10 hours.
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2018-03-15 17:38:08 UTC
Are you able to reproduce this bug with fresh CURRENT, ie after r331013?  Thanks!
Comment 2 Manish Kumar 2018-03-27 07:37:10 UTC
(In reply to Edward Tomasz Napierala from comment #1)
I am hitting this panic with revision r331013.

The backtrace look a bit different, but the panic is same and at the same place.

-----------------------
Backtrace for panic 1:
panic: Memory modified after free 0xfffff80003035a00(32) val=0 @ 0xfffff80003035a00

cpuid = 3
time = 1522130789
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe00323a7e30
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe00323a7f00
vpanic() at vpanic+0x268/frame 0xfffffe00323a7fd0
kproc_shutdown() at kproc_shutdown/frame 0xfffffe00323a8030
trash_ctor() at trash_ctor+0x6b/frame 0xfffffe00323a8070
uma_zalloc_arg() at uma_zalloc_arg+0x284/frame 0xfffffe00323a81b0
bucket_alloc() at bucket_alloc+0x102/frame 0xfffffe00323a81f0
uma_zfree_arg() at uma_zfree_arg+0x647/frame 0xfffffe00323a82e0
uma_zfree() at uma_zfree+0x29/frame 0xfffffe00323a8310
ctl_free_beio() at ctl_free_beio+0x80/frame 0xfffffe00323a8340
ctl_complete_beio() at ctl_complete_beio+0x48/frame 0xfffffe00323a8360
ctl_be_block_move_done() at ctl_be_block_move_done+0x266/frame 0xfffffe00323a83e0
cfiscsi_datamove_in() at cfiscsi_datamove_in+0x92d/frame 0xfffffe00323a84c0
cfiscsi_datamove() at cfiscsi_datamove+0x28/frame 0xfffffe00323a84e0
ctl_datamove() at ctl_datamove+0x498/frame 0xfffffe00323a8780
ctl_be_block_dispatch_file() at ctl_be_block_dispatch_file+0x8a8/frame 0xfffffe00323a88e0
ctl_be_block_dispatch() at ctl_be_block_dispatch+0x626/frame 0xfffffe00323a89b0
ctl_be_block_worker() at ctl_be_block_worker+0x68d/frame 0xfffffe00323a8a40
taskqueue_run_locked() at taskqueue_run_locked+0x246/frame 0xfffffe00323a8ab0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe00323a8b30
fork_exit() at fork_exit+0x145/frame 0xfffffe00323a8bb0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00323a8bb0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic


-----------------------
Backtrace for panic 2:

panic: Memory modified after free 0xfffff8001207ce40(32) val=0 @ 0xfffff8001207ce40

cpuid = 1
time = 1521585045
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe002e197200
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe002e1972d0
vpanic() at vpanic+0x268/frame 0xfffffe002e1973a0
kproc_shutdown() at kproc_shutdown/frame 0xfffffe002e197400
trash_ctor() at trash_ctor+0x6b/frame 0xfffffe002e197440
uma_zalloc_arg() at uma_zalloc_arg+0x284/frame 0xfffffe002e197580
bucket_alloc() at bucket_alloc+0x102/frame 0xfffffe002e1975c0
uma_zfree_arg() at uma_zfree_arg+0x647/frame 0xfffffe002e1976b0
kmem_cache_free() at kmem_cache_free+0x28/frame 0xfffffe002e1976d0
zio_buf_free() at zio_buf_free+0x96/frame 0xfffffe002e197710
zil_lwb_flush_vdevs_done() at zil_lwb_flush_vdevs_done+0x71/frame 0xfffffe002e197790
zio_done() at zio_done+0x1531/frame 0xfffffe002e1979b0
zio_execute() at zio_execute+0x28b/frame 0xfffffe002e197a10
taskq_run_ent() at taskq_run_ent+0x34/frame 0xfffffe002e197a40
taskqueue_run_locked() at taskqueue_run_locked+0x246/frame 0xfffffe002e197ab0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe002e197b30
fork_exit() at fork_exit+0x145/frame 0xfffffe002e197bb0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002e197bb0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
Comment 3 Edward Tomasz Napierala freebsd_committer freebsd_triage 2018-03-27 09:02:51 UTC
Could you provide some more of the kernel log (dmesg) before the crash happens?  If possible, enabling debug (sysctl kern.cam.ctl.iscsi.debug=10) would help.
Comment 4 Manish Kumar 2018-03-29 13:39:50 UTC
Created attachment 191927 [details]
panic dmesg after enabling iscsi.debug=10

Attaching the dmesg file after increasing the iscsi.debug level to 10.