I've got an older Acer C720 with r314251, which was not booted for some time, and now panics on boot, also in single user mode, saying: ... Dec 30 19:54:26 c720-r314251 kernel: ada0: Command Queueing enabled Dec 30 19:54:26 c720-r314251 kernel: ada0: 244198MB (500118192 512 byte sectors) Dec 30 19:54:26 c720-r314251 kernel: WARNING: WITNESS option enabled, expect reduced performance. Dec 30 19:54:26 c720-r314251 kernel: Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]... panic: invalid bcd 194 ... The message comes from $ find * -type f -exec fgrep "invalid bcd" {} /dev/null \; sys/sys/libkern.h: ("invalid bcd %d", bcd)); $ vim sys/sys/libkern.h ... #define LIBKERN_LEN_BCD2BIN 154 #define LIBKERN_LEN_BIN2BCD 100 #define LIBKERN_LEN_HEX2ASCII 36 static inline u_char bcd2bin(int bcd) { KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN, ("invalid bcd %d", bcd)); return (bcd2bin_data[bcd]); } Additional information are here in these mail threads: http://freebsd.1045724.x6.nabble.com/panic-invalid-bcd-xxx-td6170480.html http://freebsd.1045724.x6.nabble.com/panic-invalid-bcd-194-td6228981.html
I began to work on this, with the idea of adding some common code that all RTC drivers can use to validate the bcd. Then, when I looked at atrtc.c I discovered it already has code to validate the bcd values, added in r314936. That means that this error just shouldn't be possible now unless the current validation is incorrect (it looks okay to me), or the compiler is generating bad code (which I tried to investigate, but my x86 asm skills are 25 years out of date).
I have adopted the patches from ian@ to r314251 and I'm now testing them. Have to wait, until the CMOS battery of the C720 which raised the problem goes off again, though.
A commit references this bug: Author: ian Date: Sun Jan 14 17:01:38 UTC 2018 New revision: 327971 URL: https://svnweb.freebsd.org/changeset/base/327971 Log: Add RTC clock conversions for BCD values, with non-panic validation. RTC clock hardware frequently uses BCD numbers. Currently the low-level bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD values. Every RTC driver must implement its own code for validating the unreliable data coming from the hardware to avoid a potential kernel panic. This change introduces two new functions, clock_bcd_to_ts() and clock_ts_to_bcd(). The former validates its inputs and returns EINVAL if any values are out of range. The latter guarantees the returned data will be valid BCD in a known format (4-digit years, etc). A new bcd_clocktime structure is used with the new functions. It is similar to the original clocktime structure, but defines the fields holding BCD values as uint8_t (uint16_t for year), and adds a PM flag for handling hours using AM/PM mode. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13730 (no reviewers) Changes: head/sys/kern/subr_clock.c head/sys/sys/clock.h
I can ACK: The patches from ian@ which I've adopted to r314251, fixed the problem, the system does not PANIC when CMOS battery is empty and RTC is bad.
A commit references this bug: Author: ian Date: Mon Jan 15 16:40:44 UTC 2018 New revision: 328005 URL: https://svnweb.freebsd.org/changeset/base/328005 Log: Convert the x86 RTC driver to use new validated BCD<->timespec conversions. New common routines were added to kern/subr_clock.c for converting between calendrical time expressed in BCD and struct timespec. The new functions return EINVAL on error, as expected when the clock hardware does not provide valid time. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13731 (no reviewers) Changes: head/sys/x86/isa/atrtc.c
Fixes applied to 12-current. I'll close the PR after MFC'ing the fixes to earlier branches in a few days.
A commit references this bug: Author: ian Date: Sat Mar 24 20:40:16 UTC 2018 New revision: 331496 URL: https://svnweb.freebsd.org/changeset/base/331496 Log: MFC r306288, r314936, r325527, r327971, r328005, r328039, r328068-r328069, r328301-r328303 r306288: Fix ds1307 probing 'compat' can never be NULL, because the compatible check loop ends when compat->ocd_str is NULL. This causes ds1307 to attach to any unclaimed i2c device. r314936: Validate values read from the RTC before trying BCD decoding Submitted by: cem Reported by: Michael Gmelin <freebsd@grem.de> Tested by: Oleksandr Tymoshenko <gonzo@bluezbox.com> Sponsored by: Dell EMC r325527: DS1307: Add the mcp7941x enable bit Summary: Existing code recognizes the mcp7941x RTC, but this RTC has an enable bit at the same location as the "Clock Halt" bit on the ds1307, with an opposite assertion (set == on, whereas CH set == clock stopped). Thus the current code halts the clock, with no way to enable it. Reviewed By: ian Differential Revision: https://reviews.freebsd.org/D12961 r327971: Add RTC clock conversions for BCD values, with non-panic validation. RTC clock hardware frequently uses BCD numbers. Currently the low-level bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD values. Every RTC driver must implement its own code for validating the unreliable data coming from the hardware to avoid a potential kernel panic. This change introduces two new functions, clock_bcd_to_ts() and clock_ts_to_bcd(). The former validates its inputs and returns EINVAL if any values are out of range. The latter guarantees the returned data will be valid BCD in a known format (4-digit years, etc). A new bcd_clocktime structure is used with the new functions. It is similar to the original clocktime structure, but defines the fields holding BCD values as uint8_t (uint16_t for year), and adds a PM flag for handling hours using AM/PM mode. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13730 (no reviewers) r328005: Convert the x86 RTC driver to use new validated BCD<->timespec conversions. New common routines were added to kern/subr_clock.c for converting between calendrical time expressed in BCD and struct timespec. The new functions return EINVAL on error, as expected when the clock hardware does not provide valid time. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13731 (no reviewers) r328039: Add static inline rtcin_locked() and rtcout_locked() functions for doing a related series of operations without doing a lock/unlock for each byte. Use them when reading and writing the entire set of time registers. The original rtcin() and writertc() functions which do lock/unlock on each byte still exist, because they are public and called by outside code. r328068: Move some code around and rename a couple variables; no functional changes. The static atrtc_set() function was called only from clock_settime(), so just move its contents entirely into clock_settime() and delete atrtc_set(). Rename the struct bcd_clocktime variables from 'ct' to 'bct'. I had originally wanted to emphasize how identical the clocktime and bcd_clocktime structs were, but things evolved to the point where the structs are not at all identical anymore, so now emphasizing the difference seems better. r328069: Remove redundant critical_enter/exit() calls. The block of code delimited by these calls is now protected by a spin mutex (obscured within the RTC_LOCK/RTC_UNLOCK macros). Reported by: bde@ r328301: Switch to using the bcd_clocktime conversion functinos that validate the BCD data without panicking, and have common code for handling AM/PM mode. r328302: Switch to using the bcd_clocktime conversion functions that validate the BCD data without panicking, and have common code for handling AM/PM mode. r328303: Switch to using the bcd_clocktime conversion functions that validate the BCD data without panicking, and have common code for handling AM/PM mode. Changes: _U stable/11/ stable/11/sys/dev/iicbus/ds1307.c stable/11/sys/dev/iicbus/ds1307reg.h stable/11/sys/dev/iicbus/ds13rtc.c stable/11/sys/dev/iicbus/nxprtc.c stable/11/sys/kern/subr_clock.c stable/11/sys/sys/clock.h stable/11/sys/x86/isa/atrtc.c