Bug 224813 - system panics on boot w/ "panic: invalid bcd nnn" due to broken RTC values
Summary: system panics on boot w/ "panic: invalid bcd nnn" due to broken RTC values
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Ian Lepore
URL: https://reviews.freebsd.org/D13730, h...
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-01 09:10 UTC by Matthias Apitz
Modified: 2018-03-24 20:44 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Apitz 2018-01-01 09:10:51 UTC
I've got an older Acer C720 with r314251, which was not booted for some time,
and now panics on boot, also in single user mode, saying:

...
Dec 30 19:54:26 c720-r314251 kernel: ada0: Command Queueing enabled
Dec 30 19:54:26 c720-r314251 kernel: ada0: 244198MB (500118192 512 byte sectors)
Dec 30 19:54:26 c720-r314251 kernel: WARNING: WITNESS option enabled, expect reduced performance.
Dec 30 19:54:26 c720-r314251 kernel: Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]...
panic: invalid bcd 194
...

The message comes from 

$ find * -type f -exec fgrep "invalid bcd" {} /dev/null \;
sys/sys/libkern.h:	    ("invalid bcd %d", bcd));

$ vim sys/sys/libkern.h
...
#define LIBKERN_LEN_BCD2BIN     154
#define LIBKERN_LEN_BIN2BCD     100
#define LIBKERN_LEN_HEX2ASCII   36

static inline u_char
bcd2bin(int bcd)
{

        KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
            ("invalid bcd %d", bcd));
        return (bcd2bin_data[bcd]);
}

Additional information are here in these mail threads:

http://freebsd.1045724.x6.nabble.com/panic-invalid-bcd-xxx-td6170480.html
http://freebsd.1045724.x6.nabble.com/panic-invalid-bcd-194-td6228981.html
Comment 1 Ian Lepore freebsd_committer freebsd_triage 2018-01-01 16:20:35 UTC
I began to work on this, with the idea of adding some common code that all RTC drivers can use to validate the bcd.  Then, when I looked at atrtc.c I discovered it already has code to validate the bcd values, added in r314936.  That means that this error just shouldn't be possible now unless the current validation is incorrect (it looks okay to me), or the compiler is generating bad code (which I tried to investigate, but my x86 asm skills are 25 years out of date).
Comment 2 Matthias Apitz 2018-01-03 16:11:31 UTC
I have adopted the patches from ian@ to r314251 and I'm now testing them. Have to wait, until the CMOS battery of the C720 which raised the problem goes off again, though.
Comment 3 commit-hook freebsd_committer freebsd_triage 2018-01-14 17:02:11 UTC
A commit references this bug:

Author: ian
Date: Sun Jan 14 17:01:38 UTC 2018
New revision: 327971
URL: https://svnweb.freebsd.org/changeset/base/327971

Log:
  Add RTC clock conversions for BCD values, with non-panic validation.

  RTC clock hardware frequently uses BCD numbers.  Currently the low-level
  bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD
  values.  Every RTC driver must implement its own code for validating the
  unreliable data coming from the hardware to avoid a potential kernel panic.

  This change introduces two new functions, clock_bcd_to_ts() and
  clock_ts_to_bcd().  The former validates its inputs and returns EINVAL if any
  values are out of range. The latter guarantees the returned data will be
  valid BCD in a known format (4-digit years, etc).

  A new bcd_clocktime structure is used with the new functions.  It is similar
  to the original clocktime structure, but defines the fields holding BCD
  values as uint8_t (uint16_t for year), and adds a PM flag for handling hours
  using AM/PM mode.

  PR:		224813
  Differential Revision:	https://reviews.freebsd.org/D13730 (no reviewers)

Changes:
  head/sys/kern/subr_clock.c
  head/sys/sys/clock.h
Comment 4 Matthias Apitz 2018-01-14 20:20:23 UTC
I can ACK: The patches from ian@ which I've adopted to r314251, fixed the problem, the system does not PANIC when CMOS battery is empty and RTC is bad.
Comment 5 commit-hook freebsd_committer freebsd_triage 2018-01-15 16:41:42 UTC
A commit references this bug:

Author: ian
Date: Mon Jan 15 16:40:44 UTC 2018
New revision: 328005
URL: https://svnweb.freebsd.org/changeset/base/328005

Log:
  Convert the x86 RTC driver to use new validated BCD<->timespec conversions.

  New common routines were added to kern/subr_clock.c for converting between
  calendrical time expressed in BCD and struct timespec. The new functions
  return EINVAL on error, as expected when the clock hardware does not provide
  valid time.

  PR:		224813
  Differential Revision:	https://reviews.freebsd.org/D13731 (no reviewers)

Changes:
  head/sys/x86/isa/atrtc.c
Comment 6 Ian Lepore freebsd_committer freebsd_triage 2018-01-15 17:00:21 UTC
Fixes applied to 12-current.  I'll close the PR after MFC'ing the fixes to earlier branches in a few days.
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-03-24 20:40:39 UTC
A commit references this bug:

Author: ian
Date: Sat Mar 24 20:40:16 UTC 2018
New revision: 331496
URL: https://svnweb.freebsd.org/changeset/base/331496

Log:
  MFC r306288, r314936, r325527, r327971, r328005, r328039, r328068-r328069,
      r328301-r328303

  r306288:
  Fix ds1307 probing

  'compat' can never be NULL, because the compatible check loop ends when
  compat->ocd_str is NULL.  This causes ds1307 to attach to any unclaimed i2c
  device.

  r314936:
  Validate values read from the RTC before trying BCD decoding

  Submitted by:	cem
  Reported by:	Michael Gmelin <freebsd@grem.de>
  Tested by:	Oleksandr Tymoshenko <gonzo@bluezbox.com>
  Sponsored by:	Dell EMC

  r325527:
  DS1307: Add the mcp7941x enable bit

  Summary:
  Existing code recognizes the mcp7941x RTC, but this RTC has an
  enable bit at the same location as the "Clock Halt" bit on the ds1307, with an
  opposite assertion (set == on, whereas CH set == clock stopped).  Thus the
  current code halts the clock, with no way to enable it.

  Reviewed By:	ian
  Differential Revision: https://reviews.freebsd.org/D12961

  r327971:
  Add RTC clock conversions for BCD values, with non-panic validation.

  RTC clock hardware frequently uses BCD numbers.  Currently the low-level
  bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD
  values.  Every RTC driver must implement its own code for validating the
  unreliable data coming from the hardware to avoid a potential kernel panic.

  This change introduces two new functions, clock_bcd_to_ts() and
  clock_ts_to_bcd().  The former validates its inputs and returns EINVAL if any
  values are out of range. The latter guarantees the returned data will be
  valid BCD in a known format (4-digit years, etc).

  A new bcd_clocktime structure is used with the new functions.  It is similar
  to the original clocktime structure, but defines the fields holding BCD
  values as uint8_t (uint16_t for year), and adds a PM flag for handling hours
  using AM/PM mode.

  PR:		224813
  Differential Revision:	https://reviews.freebsd.org/D13730 (no reviewers)

  r328005:
  Convert the x86 RTC driver to use new validated BCD<->timespec conversions.

  New common routines were added to kern/subr_clock.c for converting between
  calendrical time expressed in BCD and struct timespec. The new functions
  return EINVAL on error, as expected when the clock hardware does not provide
  valid time.

  PR:		224813
  Differential Revision:	https://reviews.freebsd.org/D13731 (no reviewers)

  r328039:
  Add static inline rtcin_locked() and rtcout_locked() functions for doing a
  related series of operations without doing a lock/unlock for each byte.
  Use them when reading and writing the entire set of time registers.

  The original rtcin() and writertc() functions which do lock/unlock on each
  byte still exist, because they are public and called by outside code.

  r328068:
  Move some code around and rename a couple variables; no functional changes.

  The static atrtc_set() function was called only from clock_settime(), so
  just move its contents entirely into clock_settime() and delete atrtc_set().

  Rename the struct bcd_clocktime variables from 'ct' to 'bct'.  I had
  originally wanted to emphasize how identical the clocktime and bcd_clocktime
  structs were, but things evolved to the point where the structs are not at
  all identical anymore, so now emphasizing the difference seems better.

  r328069:
  Remove redundant critical_enter/exit() calls.  The block of code delimited
  by these calls is now protected by a spin mutex (obscured within the
  RTC_LOCK/RTC_UNLOCK macros).

  Reported by:	bde@

  r328301:
  Switch to using the bcd_clocktime conversion functinos that validate the BCD
  data without panicking, and have common code for handling AM/PM mode.

  r328302:
  Switch to using the bcd_clocktime conversion functions that validate the BCD
  data without panicking, and have common code for handling AM/PM mode.

  r328303:
  Switch to using the bcd_clocktime conversion functions that validate the BCD
  data without panicking, and have common code for handling AM/PM mode.

Changes:
_U  stable/11/
  stable/11/sys/dev/iicbus/ds1307.c
  stable/11/sys/dev/iicbus/ds1307reg.h
  stable/11/sys/dev/iicbus/ds13rtc.c
  stable/11/sys/dev/iicbus/nxprtc.c
  stable/11/sys/kern/subr_clock.c
  stable/11/sys/sys/clock.h
  stable/11/sys/x86/isa/atrtc.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2018-03-24 20:40:42 UTC
A commit references this bug:

Author: ian
Date: Sat Mar 24 20:40:16 UTC 2018
New revision: 331496
URL: https://svnweb.freebsd.org/changeset/base/331496

Log:
  MFC r306288, r314936, r325527, r327971, r328005, r328039, r328068-r328069,
      r328301-r328303

  r306288:
  Fix ds1307 probing

  'compat' can never be NULL, because the compatible check loop ends when
  compat->ocd_str is NULL.  This causes ds1307 to attach to any unclaimed i2c
  device.

  r314936:
  Validate values read from the RTC before trying BCD decoding

  Submitted by:	cem
  Reported by:	Michael Gmelin <freebsd@grem.de>
  Tested by:	Oleksandr Tymoshenko <gonzo@bluezbox.com>
  Sponsored by:	Dell EMC

  r325527:
  DS1307: Add the mcp7941x enable bit

  Summary:
  Existing code recognizes the mcp7941x RTC, but this RTC has an
  enable bit at the same location as the "Clock Halt" bit on the ds1307, with an
  opposite assertion (set == on, whereas CH set == clock stopped).  Thus the
  current code halts the clock, with no way to enable it.

  Reviewed By:	ian
  Differential Revision: https://reviews.freebsd.org/D12961

  r327971:
  Add RTC clock conversions for BCD values, with non-panic validation.

  RTC clock hardware frequently uses BCD numbers.  Currently the low-level
  bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD
  values.  Every RTC driver must implement its own code for validating the
  unreliable data coming from the hardware to avoid a potential kernel panic.

  This change introduces two new functions, clock_bcd_to_ts() and
  clock_ts_to_bcd().  The former validates its inputs and returns EINVAL if any
  values are out of range. The latter guarantees the returned data will be
  valid BCD in a known format (4-digit years, etc).

  A new bcd_clocktime structure is used with the new functions.  It is similar
  to the original clocktime structure, but defines the fields holding BCD
  values as uint8_t (uint16_t for year), and adds a PM flag for handling hours
  using AM/PM mode.

  PR:		224813
  Differential Revision:	https://reviews.freebsd.org/D13730 (no reviewers)

  r328005:
  Convert the x86 RTC driver to use new validated BCD<->timespec conversions.

  New common routines were added to kern/subr_clock.c for converting between
  calendrical time expressed in BCD and struct timespec. The new functions
  return EINVAL on error, as expected when the clock hardware does not provide
  valid time.

  PR:		224813
  Differential Revision:	https://reviews.freebsd.org/D13731 (no reviewers)

  r328039:
  Add static inline rtcin_locked() and rtcout_locked() functions for doing a
  related series of operations without doing a lock/unlock for each byte.
  Use them when reading and writing the entire set of time registers.

  The original rtcin() and writertc() functions which do lock/unlock on each
  byte still exist, because they are public and called by outside code.

  r328068:
  Move some code around and rename a couple variables; no functional changes.

  The static atrtc_set() function was called only from clock_settime(), so
  just move its contents entirely into clock_settime() and delete atrtc_set().

  Rename the struct bcd_clocktime variables from 'ct' to 'bct'.  I had
  originally wanted to emphasize how identical the clocktime and bcd_clocktime
  structs were, but things evolved to the point where the structs are not at
  all identical anymore, so now emphasizing the difference seems better.

  r328069:
  Remove redundant critical_enter/exit() calls.  The block of code delimited
  by these calls is now protected by a spin mutex (obscured within the
  RTC_LOCK/RTC_UNLOCK macros).

  Reported by:	bde@

  r328301:
  Switch to using the bcd_clocktime conversion functinos that validate the BCD
  data without panicking, and have common code for handling AM/PM mode.

  r328302:
  Switch to using the bcd_clocktime conversion functions that validate the BCD
  data without panicking, and have common code for handling AM/PM mode.

  r328303:
  Switch to using the bcd_clocktime conversion functions that validate the BCD
  data without panicking, and have common code for handling AM/PM mode.

Changes:
_U  stable/11/
  stable/11/sys/dev/iicbus/ds1307.c
  stable/11/sys/dev/iicbus/ds1307reg.h
  stable/11/sys/dev/iicbus/ds13rtc.c
  stable/11/sys/dev/iicbus/nxprtc.c
  stable/11/sys/kern/subr_clock.c
  stable/11/sys/sys/clock.h
  stable/11/sys/x86/isa/atrtc.c