Bug 225791

Summary:	ena driver causing kernel panics on AWS EC2
Product:	Base System	Reporter:	Mike Walker <mike.walker>
Component:	kern	Assignee:	freebsd-virtualization (Nobody) <virtualization>
Status:	Closed FIXED
Severity:	Affects Only Me	CC:	ale, andreas.sommer87, berend, cperciva, crane76, elij+freebsd, in2, leif, mmpestorich, pete, pstef, richard, terje
Priority:	---
Version:	11.1-RELEASE
Hardware:	amd64
OS:	Any

Description Mike Walker 2018-02-09 15:38:47 UTC

Overview: 

On some of our more heavily-trafficked c5.large and c5.xlarge servers we're seeing kernel panics & reboots as often as 2-3 times per day.

Steps to Reproduce:

Have a server with a good amount of network traffic (~3000pps) using the ena network adapter. Wait a day or less.



Additional Information:

Here's one of the panics:

Feb  5 20:58:51 awesomehost kernel: ena0: Found a Tx that wasn't completed on time, qid 0, index 0.
Feb  6 03:48:11 awesomehost kernel: ena0: Found a Tx that wasn't completed on time, qid 1, index 203.
Feb  6 04:25:46 awesomehost syslogd: kernel boot file is /boot/kernel/kernel
Feb  6 04:25:46 awesomehost kernel: [zone: mbuf_jumbo_16k] kern.ipc.nmbjumbo16 limit reached
Feb  6 04:25:46 awesomehost kernel:
Feb  6 04:25:46 awesomehost kernel:
Feb  6 04:25:46 awesomehost kernel: Fatal trap 12: page fault while in kernel mode
Feb  6 04:25:46 awesomehost kernel: cpuid = 0; apic id = 00
Feb  6 04:25:46 awesomehost kernel: fault virtual address       = 0x1c
Feb  6 04:25:46 awesomehost kernel: fault code          = supervisor write data, page not present
Feb  6 04:25:46 awesomehost kernel: instruction pointer = 0x20:0xffffffff8218ef8c
Feb  6 04:25:46 awesomehost kernel: stack pointer               = 0x0:0xfffffe0108afe180
Feb  6 04:25:46 awesomehost kernel: frame pointer               = 0x0:0xfffffe0108afe260
Feb  6 04:25:46 awesomehost kernel: code segment                = base rx0, limit 0xfffff, type 0x1b
Feb  6 04:25:46 awesomehost kernel:
Feb  6 04:25:46 awesomehost kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Feb  6 04:25:46 awesomehost kernel: processor eflags    =
Feb  6 04:25:46 awesomehost kernel: Fatal trap 12: page fault while in kernel mode
Feb  6 04:25:46 awesomehost kernel: cpuid = 1; apic id = 01
Feb  6 04:25:46 awesomehost kernel: fault virtual address       = 0x1c
Feb  6 04:25:46 awesomehost kernel: fault code          = supervisor write data, page not present
Feb  6 04:25:46 awesomehost kernel: instruction pointer = 0x20:0xffffffff8218ef8c
Feb  6 04:25:46 awesomehost kernel: interrupt enabled, resume, IOPL = 0
Feb  6 04:25:46 awesomehost kernel: current process             = 12 (irq261: ena0)
Feb  6 04:25:46 awesomehost kernel: trap number         = 12
Feb  6 04:25:46 awesomehost kernel: panic: page fault
Feb  6 04:25:46 awesomehost kernel: cpuid = 0
Feb  6 04:25:46 awesomehost kernel: KDB: stack backtrace:
Feb  6 04:25:46 awesomehost kernel: #0 0xffffffff80aadac7 at kdb_backtrace+0x67
Feb  6 04:25:46 awesomehost kernel: #1 0xffffffff80a6bba6 at vpanic+0x186
Feb  6 04:25:46 awesomehost kernel: #2 0xffffffff80a6ba13 at panic+0x43
Feb  6 04:25:46 awesomehost kernel: #3 0xffffffff80edf832 at trap_fatal+0x322
Feb  6 04:25:46 awesomehost kernel: #4 0xffffffff80edf889 at trap_pfault+0x49
Feb  6 04:25:46 awesomehost kernel: #5 0xffffffff80edf0c6 at trap+0x286
Feb  6 04:25:46 awesomehost kernel: #6 0xffffffff80ec36d1 at calltrap+0x8
Feb  6 04:25:46 awesomehost kernel: #7 0xffffffff80a321ec at intr_event_execute_handlers+0xec
Feb  6 04:25:46 awesomehost kernel: #8 0xffffffff80a324d6 at ithread_loop+0xd6
Feb  6 04:25:46 awesomehost kernel: #9 0xffffffff80a2f845 at fork_exit+0x85
Feb  6 04:25:46 awesomehost kernel: #10 0xffffffff80ec3c0e at fork_trampoline+0xe
Feb  6 04:25:46 awesomehost kernel: Uptime: 8h48m21s
Feb  6 04:25:46 awesomehost kernel: Rebooting...


Here's a full boot log:

Feb  8 22:57:59 awesomehost kernel: Copyright (c) 1992-2017 The FreeBSD Project.                                                                                                                                                                                                                                  [37/2338]
Feb  8 22:57:59 awesomehost kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Feb  8 22:57:59 awesomehost kernel: The Regents of the University of California. All rights reserved.
Feb  8 22:57:59 awesomehost kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Feb  8 22:57:59 awesomehost kernel: FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017
Feb  8 22:57:59 awesomehost kernel: root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
Feb  8 22:57:59 awesomehost kernel: FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
Feb  8 22:57:59 awesomehost kernel: VT(vga): text 80x25
Feb  8 22:57:59 awesomehost kernel: CPU: HammerEM64T (3000.07-MHz K8-class CPU)
Feb  8 22:57:59 awesomehost kernel: Origin="GenuineIntel"  Id=0x50653  Family=0x6  Model=0x55  Stepping=3
Feb  8 22:57:59 awesomehost kernel: Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
Feb  8 22:57:59 awesomehost kernel: Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
Feb  8 22:57:59 awesomehost kernel: AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
Feb  8 22:57:59 awesomehost kernel: AMD Features2=0x121<LAHF,ABM,Prefetch>
Feb  8 22:57:59 awesomehost kernel: Structured Extended Features=0xd11f4f3a<TSCADJ,BMI1,HLE,AVX2,BMI2,ERMS,INVPCID,RTM,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLWB,AVX512CD,AVX512BW>
Feb  8 22:57:59 awesomehost kernel: Structured Extended Features2=0x8<PKU>
Feb  8 22:57:59 awesomehost kernel: XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
Feb  8 22:57:59 awesomehost kernel: TSC: P-state invariant, performance statistics
Feb  8 22:57:59 awesomehost kernel: Hypervisor: Origin = "KVMKVMKVM"
Feb  8 22:57:59 awesomehost kernel: real memory  = 5114953728 (4878 MB)
Feb  8 22:57:59 awesomehost kernel: avail memory = 3845005312 (3666 MB)
Feb  8 22:57:59 awesomehost kernel: Event timer "LAPIC" quality 600
Feb  8 22:57:59 awesomehost kernel: ACPI APIC Table: <AMAZON AMZNAPIC>
Feb  8 22:57:59 awesomehost kernel: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
Feb  8 22:57:59 awesomehost kernel: FreeBSD/SMP: 1 package(s) x 1 core(s) x 2 hardware threads
Feb  8 22:57:59 awesomehost kernel: random: unblocking device.
Feb  8 22:57:59 awesomehost kernel: ioapic0 <Version 1.1> irqs 0-23 on motherboard
Feb  8 22:57:59 awesomehost kernel: SMP: AP CPU #1 Launched!
Feb  8 22:57:59 awesomehost kernel: random: entropy device external interface
Feb  8 22:57:59 awesomehost kernel: kbd1 at kbdmux0
Feb  8 22:57:59 awesomehost kernel: netmap: loaded module
Feb  8 22:57:59 awesomehost kernel: module_register_init: MOD_LOAD (vesa, 0xffffffff80f5b220, 0) error 19
Feb  8 22:57:59 awesomehost kernel: random: registering fast source Intel Secure Key RNG
Feb  8 22:57:59 awesomehost kernel: random: fast provider: "Intel Secure Key RNG"
Feb  8 22:57:59 awesomehost kernel: nexus0
Feb  8 22:57:59 awesomehost kernel: vtvga0: <VT VGA driver> on motherboard
Feb  8 22:57:59 awesomehost kernel: cryptosoft0: <software crypto> on motherboard
Feb  8 22:57:59 awesomehost kernel: acpi0: <AMAZON AMZNRSDT> on motherboard
Feb  8 22:57:59 awesomehost kernel: acpi0: Power Button (fixed)
Feb  8 22:57:59 awesomehost kernel: cpu0: <ACPI CPU> on acpi0
Feb  8 22:57:59 awesomehost kernel: cpu1: <ACPI CPU> on acpi0
Feb  8 22:57:59 awesomehost kernel: atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
Feb  8 22:57:59 awesomehost kernel: Event timer "RTC" frequency 32768 Hz quality 0
Feb  8 22:57:59 awesomehost kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
Feb  8 22:57:59 awesomehost kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0
Feb  8 22:57:59 awesomehost kernel: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
Feb  8 22:57:59 awesomehost kernel: pci0: <ACPI PCI bus> on pcib0
Feb  8 22:57:59 awesomehost kernel: isab0: <PCI-ISA bridge> at device 1.0 on pci0
Feb  8 22:57:59 awesomehost kernel: isa0: <ISA bus> on isab0
Feb  8 22:57:59 awesomehost kernel: pci0: <old, non-VGA display device> at device 1.3 (no driver attached)
Feb  8 22:57:59 awesomehost kernel: vgapci0: <VGA-compatible display> mem 0xfe400000-0xfe7fffff at device 3.0 on pci0
Feb  8 22:57:59 awesomehost kernel: vgapci0: Boot video device
Feb  8 22:57:59 awesomehost kernel: nvme0: <Generic NVMe Device> mem 0xfebf0000-0xfebf3fff irq 11 at device 4.0 on pci0
Feb  8 22:57:59 awesomehost kernel: ena0: <ENA adapter> mem 0xfebf4000-0xfebf7fff at device 5.0 on pci0
Feb  8 22:57:59 awesomehost kernel: ena0: Elastic Network Adapter (ENA)ena v0.7.0
Feb  8 22:57:59 awesomehost kernel: ena0: initalize 2 io queues
Feb  8 22:57:59 awesomehost kernel: ena0: Ethernet address: 12:3d:00:e2:4f:78
Feb  8 22:57:59 awesomehost kernel: ena0: Allocated msix_entries, vectors (cnt: 3)
Feb  8 22:57:59 awesomehost kernel: nvme1: <Generic NVMe Device> mem 0xfebf8000-0xfebfbfff irq 11 at device 31.0 on pci0
Feb  8 22:57:59 awesomehost kernel: ena0: link is UP
Feb  8 22:57:59 awesomehost kernel: ena0: link state changed to UP
Feb  8 22:57:59 awesomehost kernel: atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
Feb  8 22:57:59 awesomehost kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0
Feb  8 22:57:59 awesomehost kernel: kbd0 at atkbd0
Feb  8 22:57:59 awesomehost kernel: atkbd0: [GIANT-LOCKED]
Feb  8 22:57:59 awesomehost kernel: uart0: <Non-standard ns8250 class UART with FIFOs> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
Feb  8 22:57:59 awesomehost kernel: uart0: console (115200,n,8,1)
Feb  8 22:57:59 awesomehost kernel: orm0: <ISA Option ROM> at iomem 0xef000-0xeffff on isa0
Feb  8 22:57:59 awesomehost kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Feb  8 22:57:59 awesomehost kernel: attimer0: <AT timer> at port 0x40 on isa0
Feb  8 22:57:59 awesomehost kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Feb  8 22:57:59 awesomehost kernel: attimer0: Can't map interrupt.
Feb  8 22:57:59 awesomehost kernel: ppc0: cannot reserve I/O port range
Feb  8 22:57:59 awesomehost kernel: Timecounters tick every 1.000 msec
Feb  8 22:57:59 awesomehost kernel: usb_needs_explore_all: no devclass
Feb  8 22:57:59 awesomehost kernel: nvme cam probe device init
Feb  8 22:57:59 awesomehost kernel: nvme0: temperature threshold not supported
Feb  8 22:57:59 awesomehost kernel: nvd0: <Amazon Elastic Block Store> NVMe namespace
Feb  8 22:57:59 awesomehost kernel: nvd0: 10240MB (20971520 512 byte sectors)
Feb  8 22:57:59 awesomehost kernel: nvme1: temperature threshold not supported
Feb  8 22:57:59 awesomehost kernel: nvd1: <Amazon Elastic Block Store> NVMe namespace
Feb  8 22:57:59 awesomehost kernel: nvd1: 40960MB (83886080 512 byte sectors)
Feb  8 22:57:59 awesomehost kernel: Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
Feb  8 22:57:59 awesomehost kernel: WARNING: / was not properly dismounted
Feb  8 22:57:59 awesomehost kernel: WARNING: /: mount pending error: blocks 0 files 2
Feb  8 22:57:59 awesomehost kernel: WARNING: /: reload pending error: blocks 0 files 2
Feb  8 22:57:59 awesomehost kernel: WARNING: / was not properly dismounted
Feb  8 22:57:59 awesomehost kernel: WARNING: /sites was not properly dismounted
Feb  8 22:57:59 awesomehost kernel: ena0: device is going UP
Feb  8 22:57:59 awesomehost kernel: ena0: queue 0 - cpu 0
Feb  8 22:57:59 awesomehost kernel: ena0: queue 1 - cpu 1
Feb  8 22:57:59 awesomehost kernel: pflog0: promiscuous mode enabled


Below are some numbers from one of our hosts experiencing this the most, it's a c5.large server in us-east-1, booted from the official FreeBSD AMI "FreeBSD 11.1-RELEASE-amd64 (ami-70504266)"


Here's "uname -ar":

FreeBSD awesomehost 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64


Here's "uptime":

 3:29PM  up 16:31, 2 users, load averages: 0.08, 0.22, 0.23


Here's "sysctl dev.ena":

dev.ena.0.admin_stats.no_completion: 0
dev.ena.0.admin_stats.out_of_space: 0
dev.ena.0.admin_stats.completed_cmd: 58446
dev.ena.0.admin_stats.sumbitted_cmd: 58446
dev.ena.0.admin_stats.aborted_cmd: 0
dev.ena.0.hw_stats.update_stats:
dev.ena.0.hw_stats.rx_drops: 0
dev.ena.0.hw_stats.tx_bytes: 130551171301
dev.ena.0.hw_stats.rx_bytes: 129414445543
dev.ena.0.hw_stats.tx_packets: 175108516
dev.ena.0.hw_stats.rx_packets: 174924189
dev.ena.0.queue1.rx_ring.small_copy_len_pkt: 0
dev.ena.0.queue1.rx_ring.bad_desc_num: 0
dev.ena.0.queue1.rx_ring.dma_mapping_err: 0
dev.ena.0.queue1.rx_ring.mbuf_alloc_fail: 0
dev.ena.0.queue1.rx_ring.page_alloc_fail: 0
dev.ena.0.queue1.rx_ring.bad_csum: 0
dev.ena.0.queue1.rx_ring.refil_partial: 0
dev.ena.0.queue1.rx_ring.bytes: 65125566440
dev.ena.0.queue1.rx_ring.count: 87968075
dev.ena.0.queue1.tx_ring.mbuf_collapse_err: 0
dev.ena.0.queue1.tx_ring.mbuf_collapses: 0
dev.ena.0.queue1.tx_ring.stops: 0
dev.ena.0.queue1.tx_ring.bad_req_id: 0
dev.ena.0.queue1.tx_ring.missing_tx_comp: 0
dev.ena.0.queue1.tx_ring.doorbells: 87925945
dev.ena.0.queue1.tx_ring.tx_poll: 0
dev.ena.0.queue1.tx_ring.napi_comp: 0
dev.ena.0.queue1.tx_ring.unsupported_desc_num: 0
dev.ena.0.queue1.tx_ring.dma_mapping_err: 0
dev.ena.0.queue1.tx_ring.queue_wakeup: 0
dev.ena.0.queue1.tx_ring.prepare_ctx_err: 0
dev.ena.0.queue1.tx_ring.bytes: 65756583294
dev.ena.0.queue1.tx_ring.count: 87999157
dev.ena.0.queue0.rx_ring.small_copy_len_pkt: 0
dev.ena.0.queue0.rx_ring.bad_desc_num: 0
dev.ena.0.queue0.rx_ring.dma_mapping_err: 0
dev.ena.0.queue0.rx_ring.mbuf_alloc_fail: 0
dev.ena.0.queue0.rx_ring.page_alloc_fail: 0
dev.ena.0.queue0.rx_ring.bad_csum: 0
dev.ena.0.queue0.rx_ring.refil_partial: 0
dev.ena.0.queue0.rx_ring.bytes: 64288879103
dev.ena.0.queue0.rx_ring.count: 86956114
dev.ena.0.queue0.tx_ring.mbuf_collapse_err: 0
dev.ena.0.queue0.tx_ring.mbuf_collapses: 0
dev.ena.0.queue0.tx_ring.stops: 0
dev.ena.0.queue0.tx_ring.bad_req_id: 0
dev.ena.0.queue0.tx_ring.missing_tx_comp: 0
dev.ena.0.queue0.tx_ring.doorbells: 87033454
dev.ena.0.queue0.tx_ring.tx_poll: 0
dev.ena.0.queue0.tx_ring.napi_comp: 0
dev.ena.0.queue0.tx_ring.unsupported_desc_num: 0
dev.ena.0.queue0.tx_ring.dma_mapping_err: 0
dev.ena.0.queue0.tx_ring.queue_wakeup: 0
dev.ena.0.queue0.tx_ring.prepare_ctx_err: 0
dev.ena.0.queue0.tx_ring.bytes: 64794589485
dev.ena.0.queue0.tx_ring.count: 87109364
dev.ena.0.admin_q_pause: 0
dev.ena.0.interface_down: 0
dev.ena.0.interface_up: 1
dev.ena.0.wd_expired: 0
dev.ena.0.io_resume: 0
dev.ena.0.io_suspend: 0
dev.ena.0.tx_timeout: 0
dev.ena.0.missing_tx_threshold: 128
dev.ena.0.missing_tx_max_queues: 4
dev.ena.0.missing_tx_timeout: 21474836480
dev.ena.0.keep_alive_timeout: 25769803776
dev.ena.0.wd_active: 1
dev.ena.0.%parent: pci0
dev.ena.0.%pnpinfo: vendor=0x1d0f device=0xec20 subvendor=0x0000 subdevice=0x0000 class=0x020000
dev.ena.0.%location: slot=5 function=0 dbsf=pci0:0:5:0 handle=\_SB_.PCI0.S05_
dev.ena.0.%driver: ena
dev.ena.0.%desc: ENA adapter
dev.ena.%parent:


Here's "vmstat -i":

interrupt                          total       rate
irq1: atkbd0                           2          0
irq4: uart0                            6          0
cpu0:timer                      41324147        695
cpu1:timer                      26268292        442
irq256: nvme0                          9          0
irq257: nvme0                      75412          1
irq258: nvme0                      67504          1
irq259: ena0                      118511          2
irq260: ena0                   101753746       1711
irq261: ena0                   102189440       1718
irq262: nvme1                          9          0
irq263: nvme1                     125224          2
irq264: nvme1                     118774          2
Total                          272041076       4575


Here's "netstat -in":

Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
ena0   1500 <Link#1>      12:3d:00:e2:4f:78 177839782     0     0 178011394     0     0
ena0      - 192.168.4.0/2 192.168.4.101     61087021     -     - 45501246     -     -
lo0   16384 <Link#2>      lo0               15856955     0     0 15856955     0     0
lo0       - ::1/128       ::1                      0     -     -        0     -     -
lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
lo0       - 127.0.0.0/8   127.0.0.1            61418     -     - 15856955     -     -
pflog 33160 <Link#3>      pflog0                   0     0     0     1797     0     0


Here's "sysctl kern.ipc":

kern.ipc.maxsockbuf: 2097152
kern.ipc.sockbuf_waste_factor: 8
kern.ipc.max_linkhdr: 16
kern.ipc.max_protohdr: 60
kern.ipc.max_hdr: 76
kern.ipc.max_datalen: 92
kern.ipc.maxsockets: 122296
kern.ipc.numopensockets: 2141
kern.ipc.soacceptqueue: 65536
kern.ipc.shm_allow_removed: 1
kern.ipc.shm_use_phys: 0
kern.ipc.shmall: 131072
kern.ipc.shmseg: 128
kern.ipc.shmmni: 192
kern.ipc.shmmin: 1
kern.ipc.shmmax: 536870912
kern.ipc.semaem: 16384
kern.ipc.semvmx: 32767
kern.ipc.semusz: 632
kern.ipc.semume: 50
kern.ipc.semopm: 100
kern.ipc.semmsl: 340
kern.ipc.semmnu: 150
kern.ipc.semmns: 340
kern.ipc.semmni: 50
kern.ipc.msgseg: 2048
kern.ipc.msgssz: 8
kern.ipc.msgtql: 40
kern.ipc.msgmnb: 2048
kern.ipc.msgmni: 40
kern.ipc.msgmax: 16384
kern.ipc.aio.lifetime: 30000
kern.ipc.aio.target_procs: 4
kern.ipc.aio.num_procs: 4
kern.ipc.aio.max_procs: 32
kern.ipc.aio.empty_retries: 0
kern.ipc.aio.empty_results: 0
kern.ipc.piperesizeallowed: 1
kern.ipc.piperesizefail: 0
kern.ipc.pipeallocfail: 0
kern.ipc.pipefragretry: 0
kern.ipc.pipekva: 69632
kern.ipc.maxpipekva: 62611456
kern.ipc.umtx_max_robust: 1000
kern.ipc.umtx_vnode_persistent: 0
kern.ipc.nmbufs: 1514295
kern.ipc.nmbjumbo16: 30000
kern.ipc.nmbjumbo9: 35053
kern.ipc.nmbjumbop: 118304
kern.ipc.nmbclusters: 236608
kern.ipc.maxmbufmem: 1938294784

Comment 1 Terje Elde 2018-05-22 22:25:20 UTC

We're also affected by this, running c5.large, handling about 13 000 connections through haproxy, then varnish and on to other systems.  Activity was about 4000 requests pr. minute leading up to the crash, which doesn't seem all that high.  It's possible that it could have spiked shortly before the crash though, without getting that in the logs.

This is:
FreeBSD [host snipped] 11.1-RELEASE-p8 FreeBSD 11.1-RELEASE-p8 #0: Tue Mar 13 17:07:05 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

It's a lightly modified/configured version of one of the usual FreeBSD AMIs, I don't recall the AMI ID exactly, sorry.  Kernel etc is stock, we've just made additions in terms of software etc for our own AMI.

We have two virtually identical machines exposed under the same hostname, receiving a near identical load, and have so far only been noticing this with one of the machines.  Could be coincidental, but figured it worthwhile to mention.

It strikes me as noteworthy that the data rate was only about 700kBps at the last data point I have before the crash.  Unfortunately I don't know anything about packet rate, and again it's possible that there could have been a peak leading up to the crash, without getting the logs of it.

If anyone is interested in any other data from this, please do let me know.  Also, this is part of a redundant setup, allowing some extra room for moving things around if anyone wants anything tested or tried on the setup.

>> Crash itself:

Limiting open port RST response from 457 to 200 packets/sec
Limiting open port RST response from 487 to 200 packets/sec
Limiting open port RST response from 541 to 200 packets/sec
Limiting open port RST response from 517 to 200 packets/sec
Limiting open port RST response from 586 to 200 packets/sec
Limiting open port RST response from 237 to 200 packets/sec
ena0: Found a Tx that wasn't completed on time, qid 1, index 324.
pid 3639 (varnishd), uid 429: exited on signal 6
Limiting open port RST response from 259 to 200 packets/sec
Limiting open port RST response from 380 to 200 packets/sec
ena0: Found a Tx that wasn't completed on time, qid 1, index 181.


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x1c
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff82173f8c
stack pointer           = 0x28:0xfffffe0110f43180
frame pointer           = 0x28:0xfffffe0110f43260
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq261: ena0)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80aadac7 at kdb_backtrace+0x67
#1 0xffffffff80a6bba6 at vpanic+0x186
#2 0xffffffff80a6ba13 at panic+0x43
#3 0xffffffff80ee3092 at trap_fatal+0x322
#4 0xffffffff80ee30eb at trap_pfault+0x4b
#5 0xffffffff80ee290a at trap+0x2ca
#6 0xffffffff80ec3d40 at calltrap+0x8
#7 0xffffffff80a321ec at intr_event_execute_handlers+0xec
#8 0xffffffff80a324d6 at ithread_loop+0xd6
#9 0xffffffff80a2f845 at fork_exit+0x85
#10 0xffffffff80ec4a0e at fork_trampoline+0xe
Uptime: 8d22h59m55s
Rebooting...


>> boot log:

Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE-p8 #0: Tue Mar 13 17:07:05 UTC 2018
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
VT(vga): text 80x25
CPU: HammerEM64T (3000.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50653  Family=0x6  Model=0x55  Stepping=3
  Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
  Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0xd11f4fbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLWB,AVX512C
D,AVX512BW>
  Structured Extended Features2=0x8<PKU>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  TSC: P-state invariant, performance statistics
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 5114953728 (4878 MB)
avail memory = 3844890624 (3666 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <AMAZON AMZNAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 1 core(s) x 2 hardware threads
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff80f5eb40, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <AMAZON AMZNRSDT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <old, non-VGA display device> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xfe400000-0xfe7fffff at device 3.0 on pci0
vgapci0: Boot video device
nvme0: <Generic NVMe Device> mem 0xfebf0000-0xfebf3fff irq 11 at device 4.0 on pci0
ena0: <ENA adapter> mem 0xfebf4000-0xfebf7fff at device 5.0 on pci0
ena0: Elastic Network Adapter (ENA)ena v0.7.0
ena0: initalize 2 io queues
ena0: Ethernet address: 02:2b:3a:f4:70:8c
ena0: Allocated msix_entries, vectors (cnt: 3)
nvme1: <Generic NVMe Device> mem 0xfebf8000-0xfebfbfff irq 11 at device 31.0 on pci0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
uart0: <Non-standard ns8250 class UART with FIFOs> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
orm0: <ISA Option ROM> at iomem 0xef000-0xeffff on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
attimer0: Can't map interrupt.
ppc0: cannot reserve I/O port range
ena0: link is UP
ena0: link state changed to UP
Timecounters tick every 1.000 msec
usb_needs_explore_all: no devclass
nvme cam probe device init
nvme0: temperature threshold not supported
nvd0: <Amazon Elastic Block Store> NVMe namespace
nvd0: 20480MB (41943040 512 byte sectors)
nvme1: temperature threshold not supported
nvd1: <Amazon Elastic Block Store> NVMe namespace
GEOM: nvd1: corrupt or invalid GPT detected.
nvd1: 20480MB (41943040 512 byte sectors)
GEOM: nvd1: GPT rejected -- may not be recoverable.
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...

Comment 2 Richard Paul 2018-06-25 15:21:25 UTC

We're seeing this now since we migrated out instance between instance types.  Before we were using a c5.9xlarge instance and we recently scaled that back down (we'd rebuilt and scaled up earlier in the year to deal with seasonal load).

This instance is our primary db instance and the kernel panics seem to happen either during the db dump process (which occur when we have the least amount of DB traffic) or as has happened once so far, during the peak of the daily load.

Our log output is almost the same except as that already submitted, although out current process line reads:

 ```current process              = 12 (irq269: ena0)```

The two previous reports had the same IRQ number and are on the same class of instance types whereas we're on an M5 class instance type and get a slightly different IRQ number.  Also, this was happening on 11.1p4 but we since upgraded to p10 and the issue still is occurring.  I'm planning on upgrading a clone to 11.2p0 later this week to check out whether there's a new version of the ENA (from here https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena) in this build.

Comment 3 Richard Paul 2018-06-26 10:49:50 UTC

(In reply to Richard Paul from comment #2)

Just a quick grab of our DB server's current throughput with `systat -ifstat -pps`:

ena0  in      4.498 Kp/s          4.498 Kp/s          279.347 Mp
      out    12.956 Kp/s         12.956 Kp/s          674.161 Mp

We have a couple of Varnish servers in front of this platform (r4.2xlarge) which are rock solid and which don't see anything like this kind of throughput which is possibly why it's only this server we're seeing this issue with.

Comment 4 Colin Percival freebsd_committer

2018-06-27 19:06:24 UTC

Sorry, I'm coming to this late -- somehow I never saw this PR earlier.

It's possible that this is an ENA driver bug, but C5 also has the added complication of using an entirely different virtualization platform, and I'm a bit suspicious of the backtrace here.  Can one of you try to reproduce this on a different instance type -- m4.16xlarge or r4.* would probably be best -- so we can see if it's specifically an *ENA* problem or a *C5* problem?

Comment 5 Richard Paul 2018-06-28 10:46:43 UTC

Hi Colin,

Thanks for responding to this issue.  You're right to point out that this may be down to the difference in hypervisors.  M5 seem to be based on HVM too so for our purposes maybe moving over to an r4.2xlarge would be our nearest alternative in the r4 range of instances.  We're waiting for the inevitable failure of our current instance as it seems to be falling over after 6-7 days.

What we need is some kind of reproducible test case for this to better be able to diagnose the issue.  As the other reporters in this ticket say, this can take hours to a day to reproduce and in our case, multiple days, which makes finding such a test case so time consuming and difficult.

This is what I'm currently attempting to do with a cloned M5 instance type of our DB server running the database dump and then trying to load the server heavily but it's not producing much in the way of results at the moment.

Comment 6 Richard Paul 2018-06-28 13:36:58 UTC

Okay, with a bit of effort and this is testing against the v0.7.0 ENA driver on FreeBSD 11.1p11 I got a reboot.  But this was not easy to get to reproduce.

Here's what I did to get this to die.....eventually (it took about 3 hours)

 * Downsize the instance to an m5.large

 * I installed nginx on the DB server clone and started it (I'll detail config below)

 * I installed the phoronix-test-suite and pts/blogbench

 * I kicked off a backup of our 115GB database to a local (800GB EBS vol.) ZFS partion (the DB is held on a mirrored ZFS set on another pair of 250GB EBS vol's)

 * I kicked off a stress run of the phoronix blogbench

 * With two t2.medium instances in the same VPC, I ran wrk -d 12h -c 2k -t2 http://10.0.0.10/




Additional configuration etc.:

----------------------------------------------------------------------------------
root@os-upgrade-test-db:~ # setenv PTS_CONCURRENT_TEST_RUNS 8                                                                                                                                                      
root@os-upgrade-test-db:~ # setenv TOTAL_LOOP_TIME 30
root@os-upgrade-test-db:~ # phoronix-test-suite stress-run pts/blogbench

Choose Option 3 for Test All Options
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
pkg info nginx-full     
nginx-full-1.12.2_11,2                              
Name           : nginx-full                         
Version        : 1.12.2_11,2                        
Installed on   : Thu Jun 28 09:37:29 2018 UTC


___nginx.conf___

worker_processes  auto;                             

events {                                            
    worker_connections  2048;                       
}                                                   


http {                                              
    include       mime.types;                       
    default_type  application/octet-stream;         

    sendfile        on;                             
    tcp_nopush     on;                              

    keepalive_timeout  65;                          

    gzip  on;                                       

    server {                                        
        listen       80;                            
        server_name  localhost;                     

        location / {                                
            root   /usr/local/www/nginx;            
            index  index.html index.htm;            
        }                                           

        location = /50x.html {                      
            root   /usr/local/www/nginx-dist;       
        }                                           
    }                                               
}
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
This is just to give you an idea of what we're doing, the backup is actually a whole backups script file for doing this for each database in the RDBMS and they rsyncing to to the offsite server

__Postgres dump__

sudo -u pgsql pg_dump -j 16 -Fd dbname -f /var/backups/outfile
----------------------------------------------------------------------------------

Comment 7 Richard Paul 2018-06-28 13:42:03 UTC

I missed a couple of configuration options we'd set:

------------------------------------------------------------------------------
sysctl:

kern.ipc.soacceptqueue: 8192
------------------------------------------------------------------------------

------------------------------------------------------------------------------
postgresql.conf

max_connections = 120                               
shared_buffers = 2GB                                
effective_cache_size = 6GB                          
checkpoint_completion_target = 0.9                  
checkpoint_timeout = 1h                             
work_mem = 2MB                                      
maintenance_work_mem = 256MB                        
max_locks_per_transaction = 128                     
random_page_cost = 1.1                              
max_worker_processes = 2                 
------------------------------------------------------------------------------

I've changed the instance type to an r4.large and I'll tweak some of the postgresql settings for the additional memory on the r4 instance and I'll set it off again to attempt getting it to fall over.

Comment 8 Richard Paul 2018-06-28 14:04:22 UTC

Just to help you out with pts
-----------------------------

To run the benchmark in pts you need to install it using the phoronix-test-suite package:

#~ pkg install phoronix-test-suite
#~ phoronix-test-suite install pts/blogbench

Comment 9 Richard Paul 2018-07-02 11:21:36 UTC

So, I managed to produce sufficient load to force this to happen on an m5.large instance.  I have tried to replicate this on a r4.large instance and have failed to do so so far but I will keep trying.

As such it may not be the ENA adapter at all that is causing this issue; read more at http://www.daemonology.net/blog/2017-11-17-FreeBSD-EC2-C5-instances.html

Comment 10 Colin Percival freebsd_committer

2018-07-02 20:24:31 UTC

Have you been attaching/detaching EBS volumes while your [mc]5 instances are running?  AFAIK the nvme driver is completely stable aside from that.

Comment 11 Richard Paul 2018-07-03 07:57:05 UTC

Hi Colin,

I read the article about these drives earlier this year thanks, (p.s. is this more usable now in 11.2 or are we still waiting on an ability to hot remove drives), specifically on this test instance, no, I haven't we did do this on the original C5 server when we wanted to replace an IO2 drive which was costing us a fortune, and had to schedule some down time to detach it.

But in terms of the instance that I'm testing no, I didn't mess about with the volumes.

However, thinking about this, have you tried testing with a larger number of EBS volumes attached? And tested with load going to most of them at the same time?
We have, UFS root vol, 2x mirrored ZFS for DB, 1x ZFS for logs, 1x ZFS local backups vol.

Comment 12 Colin Percival freebsd_committer

2018-07-03 08:11:15 UTC

Sadly nvme hotplug/unplug is still broken in 11.2 -- unfortunately it turned out that some of the people who would have been fixing that were also the people who needed to work on fixing the Spectre/Meltdown/etc. issues so this got pushed off.  Right now I'm hoping that we'll have the bugs worked out in time for 12.0.

The extent of the testing I've done is a few buildworlds on a single disk -- I've been busy chasing other issues (e.g., the IPI issue which was causing userland data corruption) so I haven't been able to do much testing here.  Any you can do will be much appreciated...

Comment 13 Richard Paul 2018-07-12 08:17:57 UTC

Okay, so we thought that this was resolved with the move to an R4.2Xlarge.  It wasn't, the server page faulted last night with the same old page fault process/trap number etc.

Our next approach, I guess, will be building the latest version of the ENA adaptor on this instance since we seem to have ruled out issues with the change in drivers for HVM.

Comment 14 Ling 2018-08-02 08:28:14 UTC

Hi, I'm running 11.2-RELEASE on ec2 singapore. while, no panic encountered yet,  ENA keeps going down and up every EXACTLY 30 mins and causes about .5% packet loss.

on server 1 c5.2xlarge, traffic is 50Mbps,
Aug  2 00:24:40 ip-10-251-18-192 kernel: ena0: device is going DOWN
Aug  2 00:24:40 ip-10-251-18-192 kernel: ena0: device is going UP
Aug  2 00:54:40 ip-10-251-18-192 kernel: ena0: device is going DOWN
Aug  2 00:54:40 ip-10-251-18-192 kernel: ena0: device is going UP
Aug  2 01:24:41 ip-10-251-18-192 kernel: ena0: device is going DOWN
Aug  2 01:24:41 ip-10-251-18-192 kernel: ena0: device is going UP
Aug  2 01:54:40 ip-10-251-18-192 kernel: ena0: device is going DOWN
Aug  2 01:54:41 ip-10-251-18-192 kernel: ena0: device is going UP

on server 2 c5.large, traffic is <1Mbps
Aug  2 00:18:00 proxy621 kernel: ena0: device is going DOWN
Aug  2 00:18:00 proxy621 kernel: ena0: device is going UP
Aug  2 00:48:00 proxy621 kernel: ena0: device is going DOWN
Aug  2 00:48:00 proxy621 kernel: ena0: device is going UP
Aug  2 01:18:00 proxy621 kernel: ena0: device is going DOWN
Aug  2 01:18:00 proxy621 kernel: ena0: device is going UP
Aug  2 01:48:00 proxy621 kernel: ena0: device is going DOWN
Aug  2 01:48:00 proxy621 kernel: ena0: device is going UP


grep ena /var/log/messages is here
server 1: https://nopaste.xyz/?ac03ff403e167965#pg6GYMdb+yReKI4OFiR7vmXqVy7fCsYI5e9TX2hdqTA=
server 2: https://nopaste.xyz/?4b43d08c79c5bc32#gIcXQRyZTFZ0e7M9aW8NQQLatv78UBD3p6Gu7ZQ0QPs=

Comment 15 Colin Percival freebsd_committer

2018-08-02 08:45:14 UTC

ENA flapping every 30 minutes is almost certainly due to the MTU being set thanks to DHCP announcing support for jumbograms.  That particular bug is fixed in HEAD (r333454).

AFAIK this should not cause any of the other reported issues, but it would be good if someone who is experiencing problems can confirm that they don't happen at 30 minute intervals.

Comment 16 Colin Percival freebsd_committer

2018-09-07 22:54:00 UTC

Can anyone reproduce this on HEAD?  If this is still broken I'd like to make sure it's fixed before 12.0-RELEASE, but so far this seems quite elusive.

Comment 17 pete 2018-09-07 22:56:10 UTC

(In reply to Colin Percival from comment #16)
I will try to reproduce later today, or this weekend.  I was able to reproduce about a month ago IIRC, but will test with latest 12-CURRENT checkout.

Comment 18 Leif Pedersen 2018-09-09 00:27:18 UTC

(In reply to pete from comment #16)

I've been able to reproduce this repeatedly (but not predictably) on 11.2 on an r4.large. Not to state the blindingly obvious, but smaller instances such as t2.* aren't affected since they use xn instead of ena. It seems to be most likely at times of high network IO, which again risks stating the forehead-slappingly obvious. :)

Multiple times, the crash included the same back-trace shown in this bug. However, at least once it panicked on a double-fault, which, if related, suggests that the bug in ena could be incurring memory corruption. Now granted, I only know of one incidence of a double-fault, so it could've been running on a host with faulty RAM or something at the time. However, after each panic, I'd stop/start the instance rather than reboot, to provoke it to move to new hardware, so I'm not suggesting that the whole bug is merely from faulty host hardware.

I might beg that the fix could be patched in 11.2, or at least included in 11.3 so it won't have to wait for 12. Otherwise, AWS users will find themselves stuck on 11.1, and the approaching EOL of 11.1 will leave them without security updates, which in turn makes this an indirect security issue. However, I understand there are other considerations at play, and very much appreciate the relentless work of the security team (not to mention the work on AWS support and FreeBSD in general).

Probably too much detail: The particular case was our standby MySQL database on an r4.large. It was stable on 11.1, and problematic after I upgraded it to 11.2 (with `freebsd-update upgrade`); after five or so crashes in a month, I downgraded it back to 11.1 (again with `freebsd-update upgrade`), after which it has been perfectly stable for a couple of weeks now. It's in master-master replication with our production replica, and normally gets a fairly low but steady stream of activity from the replication. However, we have several nightly jobs that crank away on updating a model and cause a large volume of traffic in the replication stream. I don't have proper metrics on bytes/sec, so I don't have any idea whether it saturates the interface. It's enough that replication falls behind for up to a few hours, but I wouldn't call our system "huge" in terms of network traffic by any means.

The reason I included all that detail is to point out: (1) it seems to be a regression between 11.1 and 11.2, (2) r4.* are for sure affected, and (3) it may be that the problem is more likely to be triggered on moderate or bursty network traffic with much task-switching between MySQL threads, compared to a simple stream of a high speed file transfer, for example.

-Leif

Comment 19 Colin Percival freebsd_committer

2018-09-09 06:30:28 UTC

The reason I was asking about HEAD is that we're currently at 12.0-ALPHA5 -- we're going to have 12.0-RELEASE before the release engineering team goes back and does the next release from stable/11 (aka. 11.3-RELEASE).

In other words, I'd like to make sure this is fixed in the next release, but the first step towards that is knowing if it's still broken.  There have been some driver updates since 11.2 and one of them might have fixed this accidentally.

Comment 20 Leif Pedersen 2018-09-09 18:33:05 UTC

(In reply to Colin Percival from comment #19)

Cool. I may be able to clone that machine to 12 later this week and try to reproduce it...I need to finish some urgent work first.

Comment 21 Ling 2018-09-10 05:16:48 UTC

(In reply to Colin Percival from comment #15)
I tested on 12.0-alpha2 and 12.0-alpha5 on c5.large and t3.micro and did not see any ena up and down message again.
so I think this issue has been fixed.

Comment 22 Colin Percival freebsd_committer

2018-09-10 06:27:01 UTC

(In reply to Ling from comment #21)

I think the up/down state flapping is unrelated to the panics other people were seeing, so I'd like to know if other people can reproduce the issues they saw.

Comment 23 pete 2018-09-10 18:40:46 UTC

I have a c5.large ec2 instance i'm running to test this using 12.0-ALPHA5.  In the same VPC I have a system using a xn ethernet interface.  I am running iperf3 between these two systems and getting just shy of 1Gbs network throughput, and ~13Kp/s.

The c5.large system with ena interfaces has not had any problems so far.  I've run several iperf3 TCP tests for 10mins each with no errors.  As mentioned earlier, the interface flapping errors have gone away as well.

If there are other artificial benchmarks that I should run to help validate this configuration has stabilized let me know and I can run them today.

Comment 24 Alex Dupre freebsd_committer

2018-09-11 08:22:58 UTC

A bit OT, but is there a particular reason for the FreeBSD 11.2 AMI to support C5 instances and not M5?

Comment 25 Richard Paul 2018-10-29 14:04:38 UTC

This has been sat on my to do list for a while.  I'm hoping that if I can get my next job out of the way this week I'll revisit this.

Further to my previous posts we seem to see that on instances faced with moderate memory pressure whilst also seeing reasonable amounts of writing to disks that are running on ZFS datasets the reboots seem to happen more regularly.  

We don't see this just where there is memory pressure, e.g. Varnish servers which are running purely within memory we haven't seen this even though the memory usage is very close to 100%.
If we double the memory on a crashing instance, the issue goes away. 


As such I'm going to attempt to force memory pressure on a test server with an additional disk with a  zpool and zfs dataset to attempt to reproduce this on a recent 12.0 instance.

Comment 26 jaehak 2018-10-31 09:59:00 UTC

I have same problem(ena interface going down and up repeatedly).
# uname -a
FreeBSD db-20 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# ifconfig ena0
ena0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=422<TXCSUM,JUMBO_MTU,LRO>
        ether 06:4d:4b:64:e1:86
        hwaddr 06:4d:4b:64:e1:86
        inet6 fe80::44d:4bff:fe64:e186%ena0 prefixlen 64 scopeid 0x1
        inet 10.1.20.20 netmask 0xffffff00 broadcast 10.1.20.255
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active


AWS r5.large instance.
It was 11.1 release. I upgraded to 11.2 with freebsd-update.

But, my another instance is very stable
r4.large
# uname -a
FreeBSD web-10 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug  9 11:55:48 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# ifconfig ena0
ena0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=422<TXCSUM,JUMBO_MTU,LRO>
        ether 06:01:57:54:03:a2
        hwaddr 06:01:57:54:03:a2
        inet6 fe80::401:57ff:fe54:3a2%ena0 prefixlen 64 scopeid 0x1
        inet 10.1.20.10 netmask 0xffffff00 broadcast 10.1.20.255
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active

Comment 27 Richard Paul 2018-10-31 11:03:35 UTC

@jaehak

That is not this issue and has been rectified already in v12.  v12 should be out in December and this problem will go away for you (and us we're actually seeing disruption on our production systems because of this bug as the application unexpectedly can't reach the cache and database layers when the network interface is down and we're hit pretty hard by this because we receive 1M odd requests per day so there's a lot of opportunity for this to happen.)

I did some testing yesterday but I couldn't manage to reproduce the issue on either 11.2 or 12.0 Beta-1; however, the problem does still exist on the current 11.2 release because I had a test instance with jails on it that I was building our application stack in it (it's a convoluted stack with a lot of files being uploaded to S3 as part of the build) and I'd been having issues with it rebooting but yesterday it failed on startup as it wanted to drop into single user mode due to a UFS checksum issue.  Obviously this isn't possible on AWS as you don't get console access so this instance had to be written off.

Comment 28 berend 2018-12-21 21:15:21 UTC

Seeing exactly the same thing on m5.large. 100% repeatable (full zfs send/recv from another server).

Dec 21 21:07:10 nfs1 kernel: Fatal trap 12: page fault while in kernel mode
Dec 21 21:07:10 nfs1 kernel: cpuid = 0; apic id = 00
Dec 21 21:07:10 nfs1 kernel: fault virtual address      = 0x1c
Dec 21 21:07:10 nfs1 kernel: fault code         = supervisor write data, page not present
Dec 21 21:07:10 nfs1 kernel: instruction pointer        = 0x20:0xffffffff82269f5c
Dec 21 21:07:10 nfs1 kernel: stack pointer              = 0x0:0xfffffe02259ac180
Dec 21 21:07:10 nfs1 kernel: frame pointer              = 0x0:0xfffffe02259ac260
Dec 21 21:07:10 nfs1 kernel: code segment               = base rx0, limit 0xfffff, type 0x1b
Dec 21 21:07:10 nfs1 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Dec 21 21:07:10 nfs1 kernel: processor eflags   = interrupt enabled, resume, IOPL = 0
Dec 21 21:07:10 nfs1 kernel: current process            = 12 (irq260: ena0)
Dec 21 21:07:10 nfs1 kernel: trap number                = 12
Dec 21 21:07:10 nfs1 kernel: panic: page fault
Dec 21 21:07:10 nfs1 kernel: cpuid = 0
Dec 21 21:07:10 nfs1 kernel: KDB: stack backtrace:
Dec 21 21:07:10 nfs1 kernel: #0 0xffffffff80b3d577 at kdb_backtrace+0x67
Dec 21 21:07:10 nfs1 kernel: #1 0xffffffff80af6b17 at vpanic+0x177
Dec 21 21:07:10 nfs1 kernel: #2 0xffffffff80af6993 at panic+0x43
Dec 21 21:07:10 nfs1 kernel: #3 0xffffffff80f77fdf at trap_fatal+0x35f
Dec 21 21:07:10 nfs1 kernel: #4 0xffffffff80f78039 at trap_pfault+0x49
Dec 21 21:07:10 nfs1 kernel: #5 0xffffffff80f77807 at trap+0x2c7
Dec 21 21:07:10 nfs1 kernel: #6 0xffffffff80f5808c at calltrap+0x8
Dec 21 21:07:10 nfs1 kernel: #7 0xffffffff80abcd69 at intr_event_execute_handlers+0xe9
Dec 21 21:07:10 nfs1 kernel: #8 0xffffffff80abd047 at ithread_loop+0xe7
Dec 21 21:07:10 nfs1 kernel: #9 0xffffffff80aba093 at fork_exit+0x83
Dec 21 21:07:10 nfs1 kernel: #10 0xffffffff80f58fae at fork_trampoline+0xe
Dec 21 21:07:10 nfs1 kernel: Uptime: 11h49m3s
Dec 21 21:07:10 nfs1 kernel: Rebooting...

Comment 29 berend 2018-12-21 21:16:27 UTC

This is on 11.2-RELEASE-p7.

Comment 30 Richard Paul 2019-01-02 14:22:25 UTC

You'll need to try this out on 12.0 berend I think will be the response.

We have completed our migration to GCP, from AWS, now so can't make any more progress on this.

Comment 31 Leif Pedersen 2019-01-05 00:25:56 UTC

ena in 12.0-RELEASE is still not stable on AWS. We see these kernel messages a few times per day:

ena_com_prepare_tx() [TID:100766]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100766]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100378]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100376]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100844]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100765]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100363]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100523]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena0: Keep alive watchdog timeout.
ena0: Trigger reset is on
ena0: device is going DOWN
ena0: device is going UP
ena0: link is UP
ena_com_prepare_tx() [TID:100401]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100477]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs
ena_com_prepare_tx() [TID:100634]: Not enough space in the tx queue
 
ena0: failed to prepare tx bufs

Comment 32 Colin Percival freebsd_committer

2019-01-05 00:31:23 UTC

Leif, could you open a new PR for that and CC me?  I'll get some people to look at it but I think it's an unrelated issue so I don't want to force them to wade through this entire thread.

Comment 33 Mike Walker 2019-01-08 16:06:11 UTC

I'm experiencing packet loss w/ENA & FreeBSD 12.0, relevant bug report here: bug #234754

Comment 34 Leif Pedersen 2019-01-10 23:10:45 UTC

(In reply to Colin Percival from comment #32)
You bet. Sorry, I was away for a few days. I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234838

Comment 35 Colin Percival freebsd_committer

2020-04-22 23:37:13 UTC

I believe that this patch should fix the underlying problem, which is in the ENA driver: https://reviews.freebsd.org/D24546

If you're able to build a custom kernel, please test that patch and report results in that review or via email (cperciva@).

Comment 36 commit-hook freebsd_committer

2020-05-07 11:28:45 UTC

A commit references this bug:

Author: mw
Date: Thu May  7 11:28:40 UTC 2020
New revision: 360777
URL: https://svnweb.freebsd.org/changeset/base/360777

Log:
  Optimize ENA Rx refill for low memory conditions

  Sometimes, especially when there is not much memory in the system left,
  allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
  and it is not guaranteed that it'll succeed. In that situation, the
  fallback will work, but if the refill needs to take a place for a lot of
  descriptors at once, the time spent in m_getjcl looking for memory can
  cause system unresponsiveness due to high priority of the Rx task. This
  can also lead to driver reset, because Tx cleanup routine is being
  blocked and timer service could detect that Tx packets aren't cleaned
  up. The reset routine can further create another unresponsiveness - Rx
  rings are being refilled there, so m_getjcl will again burn the CPU.
  This was causing NVMe driver timeouts and resets, because network driver
  is having higher priority.

  Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
  enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
  more space than 9KB will be needed.

  However, 9KB jumbo clusters can still cause issues, so by default the
  page size mbuf cluster will be used for the Rx descriptors. This can have a
  small (~2%) impact on the throughput of the device, so to restore
  original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
  "1" in "/boot/loader.conf" file.

  As a part of this patch (important fix), the version of the driver
  was updated to v2.1.2.

  Submitted by:   cperciva
  Reviewed by:    Michal Krawczyk <mk@semihalf.com>
  Reviewed by:    Ido Segev <idose@amazon.com>
  Reviewed by:    Guy Tzalik <gtzalik@amazon.com>
  MFC after:      3 days
  PR:             225791, 234838, 235856, 236989, 243531
  Differential Revision: https://reviews.freebsd.org/D24546

Changes:
  head/sys/dev/ena/ena.c
  head/sys/dev/ena/ena.h
  head/sys/dev/ena/ena_sysctl.c
  head/sys/dev/ena/ena_sysctl.h

Comment 37 commit-hook freebsd_committer

2020-05-12 18:46:02 UTC

A commit references this bug:

Author: mw
Date: Tue May 12 18:44:41 UTC 2020
New revision: 360985
URL: https://svnweb.freebsd.org/changeset/base/360985

Log:
  MFC r360777: Optimize ENA Rx refill for low memory conditions

  Sometimes, especially when there is not much memory in the system left,
  allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
  and it is not guaranteed that it'll succeed. In that situation, the
  fallback will work, but if the refill needs to take a place for a lot of
  descriptors at once, the time spent in m_getjcl looking for memory can
  cause system unresponsiveness due to high priority of the Rx task. This
  can also lead to driver reset, because Tx cleanup routine is being
  blocked and timer service could detect that Tx packets aren't cleaned
  up. The reset routine can further create another unresponsiveness - Rx
  rings are being refilled there, so m_getjcl will again burn the CPU.
  This was causing NVMe driver timeouts and resets, because network driver
  is having higher priority.

  Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
  enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
  more space than 9KB will be needed.

  However, 9KB jumbo clusters can still cause issues, so by default the
  page size mbuf cluster will be used for the Rx descriptors. This can have a
  small (~2%) impact on the throughput of the device, so to restore
  original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
  "1" in "/boot/loader.conf" file.

  As a part of this patch (important fix), the version of the driver
  was updated to v2.1.2.

  Submitted by:   cperciva
  PR:             225791, 234838, 235856, 236989, 243531

Changes:
_U  stable/12/
  stable/12/sys/dev/ena/ena.c
  stable/12/sys/dev/ena/ena.h