Created attachment 223180 [details]
TrueNAS core kernel panic due to jails with high network loadMy TrueNAS core gets a kernel panic when I start backups from Proxmox over the 10G line. After analysis I found that it doesn't happen when I turn off the two jails! The kernel panic is new when jails are running. You can see this message on the console:
The server then restarts.
Here are the technical data of the system:
2x Intel (R) Xeon (R) CPU E5-2620 0 @ 2.00GHz
2x HP H220 HBA
10x WD RED 4TB
1x DUAL 10G Chelsio network card
2 jails, plex and zone minder
I have the same behaviors going on since my upgrade to TrueNAS 12.0 last October. But I couldn't put my finger on it. Every few days, even after a few hours, my server would stall...
My setup is neat and simple. I suspected all components. Swapped board, CPU, RAM, NIC (SPF+ 10G and Intel 1G), HBA, cables... everything with a second "sleeping" computer
Only the HDD were remaining and one Plex jails...
Right before jumping off a bridge I decided to stop the Plex Jail for a few days. To my surprise, the server has been rock solid, transferring 650GB on the network many times over 4 days.
Today, I decided to recreate the Plex Jail. In the process, I had to start/stop it a few time for configuration purposes and voilà! Bang! Everything hangs!
The jails has been running for less than 10 minutes :(
Lucky me I could grab a picture of a partial error message :
"Apr 1 18:01:04 corpus kernel: mlx4_en mlx4_core0: Internal error detected
Despite the date, this is not an April Fool. It's a victory finding out the jail was the problem!!
For you to know, this server was running perfectly since the days of FreeNAS 9.x and upgraded to FreeNAS 11 over the years up to 11.3-U5 without a hick.
I now have two options: reinstall 11.3U5 and rebuild all the ZFS pool or scrap all jails and convert Plex to a virtual machine within TrueNAS. What a nice weekend ahead :)