Bug 251154 - mlx4en sometimes panics on ACPI S3 suspend/resume
Summary: mlx4en sometimes panics on ACPI S3 suspend/resume
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2020-11-15 11:54 UTC by Greg V
Modified: 2020-11-17 18:28 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Greg V 2020-11-15 11:54:15 UTC
mlx4 doesn't currently have proper suspend/resume support, but it often comes back with a successful reset after resume.
But sometimes failures happen, e.g. when it tries to reset right before the actual suspend happens, and reset fails:

[328] mlx4_core0: PCI device did not come back after reset, aborting
[328] mlx4_core0: Fail to reset HCA
[328] panic: BUG ON err != 0 failed at [..]/sys/dev/mlx4/mlx4_core/mlx4_catas.c:187
[328] cpuid = 12
[328] time = 1605438800
[328] KDB: stack backtrace:
[328] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e97a6810
[328] vpanic() at vpanic+0x182/frame 0xfffffe00e97a6860
[328] panic() at panic+0x43/frame 0xfffffe00e97a68c0
[328] mlx4_enter_error_state() at mlx4_enter_error_state+0x482/frame 0xfffffe00e97a6900
[328] __mlx4_cmd() at __mlx4_cmd+0x60e/frame 0xfffffe00e97a6980
[328] mlx4_en_DUMP_ETH_STATS() at mlx4_en_DUMP_ETH_STATS+0x7d/frame 0xfffffe00e97a6a40
[328] mlx4_en_do_get_stats() at mlx4_en_do_get_stats+0x8f/frame 0xfffffe00e97a6aa0
[328] linux_work_fn() at linux_work_fn+0xe1/frame 0xfffffe00e97a6b00
[328] taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe00e97a6b80
[328] taskqueue_thread_loop() at taskqueue_thread_loop+0xac/frame 0xfffffe00e97a6bb0
[328] fork_exit() at fork_exit+0x7d/frame 0xfffffe00e97a6bf0
[328] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e97a6bf0