| Summary: | File corruption during reboot on Virtio-based FreeBSD guest | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Jan Siero <jan> |
| Component: | kern | Assignee: | Bryan Venteicher <bryanv> |
| Status: | Closed Not A Bug | ||
| Severity: | Affects Only Me | CC: | bryanv |
| Priority: | --- | ||
| Version: | 9.3-RELEASE | ||
| Hardware: | amd64 | ||
| OS: | Any | ||
|
Description
Jan Siero
2014-09-11 18:48:27 UTC
The last time the corruption occurred, MD5 hashes of library files in /usr/lib were saved shortly before reboot (after running freebsd-update install). After reboot, MD5 hashes for some files were different compared to the ones before reboot. So the file corruption must have taken place during reboot. Further Analysis of the last occurrence: On the host system, backtraces were found that relate to the Broadcom (bnx2) driver of the host. The backtraces were bound to the process ID of the FreeBSD guest. Backtrace entries started about 50 minutes before reboot and stopped at the time of reboot. The KVM FreeBSD guest uses the Virtio NIC network model. Linux KVM website however suggests using the e1000 network model: http://www.linux-kvm.org/page/Guest_Support_Status#FreeBSD VPS guest had been shut down for 4 days. After mounting a DVD and starting the FreeBSD VPS Guest (from hard disk) the corruption magically had disappeared: no errors, the MD5 hashes of previously corrupted files were OK. Host system was not restarted in the meantime. Temporary corruption must have occurred on the KVM host system, possibly triggered by the virtio NIC driver. That makes me suspect some HW problem, likely RAM. I don't see how the network driver could have caused some transient error like this either. Corruption reoccurred after reinstallation of the FreeBSD guest system with the use of the e1000 NIC driver. Virtio NIC driver can be excluded as cause of the corruption. Corruption followed the same pattern as previous occurrences: in a single file 4K blocks nr. 160 - 164 were corrupted during reboot, which caused them to be exact copies of the following 4K block (nr. 165) Corruption is likely to be HW related. |