Bug 251154

Summary: mlx4en sometimes panics on ACPI S3 suspend/resume
Product: Base System Reporter: Val Packett <val>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Open ---    
Severity: Affects Only Me CC: carton, emaste, hselasky, kib
Priority: --- Keywords: crash
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Val Packett 2020-11-15 11:54:15 UTC
mlx4 doesn't currently have proper suspend/resume support, but it often comes back with a successful reset after resume.
But sometimes failures happen, e.g. when it tries to reset right before the actual suspend happens, and reset fails:

[328] mlx4_core0: PCI device did not come back after reset, aborting
[328] mlx4_core0: Fail to reset HCA
[328] panic: BUG ON err != 0 failed at [..]/sys/dev/mlx4/mlx4_core/mlx4_catas.c:187
[328] cpuid = 12
[328] time = 1605438800
[328] KDB: stack backtrace:
[328] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e97a6810
[328] vpanic() at vpanic+0x182/frame 0xfffffe00e97a6860
[328] panic() at panic+0x43/frame 0xfffffe00e97a68c0
[328] mlx4_enter_error_state() at mlx4_enter_error_state+0x482/frame 0xfffffe00e97a6900
[328] __mlx4_cmd() at __mlx4_cmd+0x60e/frame 0xfffffe00e97a6980
[328] mlx4_en_DUMP_ETH_STATS() at mlx4_en_DUMP_ETH_STATS+0x7d/frame 0xfffffe00e97a6a40
[328] mlx4_en_do_get_stats() at mlx4_en_do_get_stats+0x8f/frame 0xfffffe00e97a6aa0
[328] linux_work_fn() at linux_work_fn+0xe1/frame 0xfffffe00e97a6b00
[328] taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe00e97a6b80
[328] taskqueue_thread_loop() at taskqueue_thread_loop+0xac/frame 0xfffffe00e97a6bb0
[328] fork_exit() at fork_exit+0x7d/frame 0xfffffe00e97a6bf0
[328] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e97a6bf0