Bug 216906 - [iSCSI] Bad CRC may not be correctly handled
Summary: [iSCSI] Bad CRC may not be correctly handled
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-scsi (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-08 08:42 UTC by Ben RUBSON
Modified: 2017-11-04 13:29 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben RUBSON 2017-02-08 08:42:09 UTC
Hello,

Adding hardware CRC32C support to FreeBSD (#216467), we found that iSCSI CRC errors may not be correctly handled.

When one of the patch #216467 versions generated bad CRC, here is how the iSCSI test disk behaved :
disk connected successfully (here the CRC were certainly right), but as soon as there were some trafic, disk hung, and it was not possible to recover it, reboot needed.

From /var/log/messages :
kernel: WARNING: 192.168.2.1 (iqn.2012-06.srv1:rT1): no ping reply (NOP-In) after 5 seconds; reconnecting
srv1 last message repeated 31 times
srv1 last message repeated 42 times

No digest failed message.

Should we try to reproduce / investigate on this ?

Many thanks !

Ben
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2017-02-08 16:46:32 UTC
I believe this was due to an infinite loop (bug) in that CRC code, not bad checksums.  However, if someone wants to verify this one way or another, I'd suggest adding a fail point (fail(9)) in iSCSI to forcibly fail checksums in testing.
Comment 2 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-04-11 19:57:49 UTC
Ben, can you reproduce this?
Comment 3 Ben RUBSON 2017-04-13 08:37:08 UTC
Reproduce I would say no, as the issue was thrown by a bug in #216467 patch (which of course has been corrected).
Conrad says he believes this was due to an infinite loop.
But don't you think it would be interesting to be sure iSCSI correctly handles bad checksums ? To be sure it correctly behaves, without hanging disks, without having to reboot the server... ?
Not sure however how to force bad checksums to perform this test...
Thank you !
Comment 4 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-11-04 13:29:49 UTC
The code has been tested at some point (before it got first committed), it's quite straighforward, and I see no reason to belive it somehow got broken since then.