Bug 272018 - pci_read_vpd() does not always load VPD even when valid VPD exists because of state machine bug
Summary: pci_read_vpd() does not always load VPD even when valid VPD exists because of...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Stefan Eßer
URL: https://github.com/freebsd/freebsd-sr...
Keywords: needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2023-06-15 18:17 UTC by Dave Baukus
Modified: 2023-06-21 18:41 UTC (History)
3 users (show)

See Also:


Attachments
Simpiified VPD parser (15.27 KB, patch)
2023-06-16 15:04 UTC, Stefan Eßer
no flags Details | Diff
Updated VPD patch (14.97 KB, patch)
2023-06-21 16:39 UTC, Stefan Eßer
no flags Details | Diff
Updated VPD patch (14.97 KB, patch)
2023-06-21 16:53 UTC, Stefan Eßer
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Baukus 2023-06-15 18:17:09 UTC

    
Comment 1 Dave Baukus 2023-06-15 18:41:33 UTC
The state machine to read a device's VPD in pci_vpd_read() has this statement at the end of the top level switch() statement:

if (cfg->vpd.vpd_ident == NULL || cfg->vpd.vpd_ident[0] == '\0') {
       pci_printf(cfg, "no valid vpd ident found\n");
       state = -2;
}

The issue is the cfg->vpd.vpd_ident[0] == '\0' check.

On the first iteration with state == 0, case 0: if a "name" of 2 is discovered (name = (byte >> 3) & 0xf) then the code falls into a sub-switch() statement where if everything else is valid then vpd_ident is allocated:

switch(name)
case 0x2:
    ...
    ...
    ...
    cfg->vpd.vpd_ident = malloc(remain + 1, M_DEVBUF, M_WAITOK);
    i = 0;
    state = 1;
    break;

Now when the code falls out of the top level switch() the above check triggers.
The code has not assigned anything to cfg->vpd.vpd_ident[0]; it is therefore random luck weather or not a device with valid VPD gets its VPD cached.

My quick hack: assign a characther not '\0' to cfg->vpd.vpd_ident[0] after it is allocated. It will be overwritten in case 1 of the top level switch() on the next iteration; I'm sure there's a more elegant solution.
Comment 2 Stefan Eßer freebsd_committer freebsd_triage 2023-06-16 15:04:07 UTC
Created attachment 242810 [details]
Simpiified VPD parser

The current VPD code uses a state machine to parse an assumed very flexible syntax, while the structure of valid VPD actually follows a quite strict format that can be parsed by a very simple parser that hard-codes this structure.
After fixing a kernel panic due to invalid VPD data that was not rejected by the existing parser, I have re-implemented the VPD parser based on the documented VPD definition.

See review D34268 for the code that I had uploaded as a replacement (after adding many sanity checks to the state machine code that is meant to detect invalid VPD data).

I'm not convinced that the current code's VPD checks catch all cases and found that it is much easier to get a correct and safe parser by removing the unnecessary state machine. (The state machine would make sense, if the order of elements in the VPD structure was less strict, but given the simple "linear" definition of VPD this flexibility stands in the way of easy detection of malformed VPD data.)

The suggested replacement of the current VPD parser "knows" which element to expect next (with some segments being optional) and thus does not need any state variables.

I'd appreciate a review of the patch, which is also attached to the PR (rebased to -CURRENT as of today).

It has been tested with simulated VPD data (correct and malformed) and real VPD device data - test code is available on request.
Comment 3 Stefan Eßer freebsd_committer freebsd_triage 2023-06-16 15:06:25 UTC
(In reply to Stefan Eßer from comment #2)
Since the PR is for FreeBSD-13.1:

The patch in review D34268 will probably still apply to the pci.c file in 13-STABLE, while the patch attached to this PR has been rebased on -CURRENT.
Comment 4 Dave Baukus 2023-06-19 19:33:18 UTC
The changes in https://reviews.freebsd.org/D34268 as applied to Stable 13.1 fix the VPD issue described above.

Thanks.
Comment 5 Stefan Eßer freebsd_committer freebsd_triage 2023-06-21 16:39:04 UTC
Created attachment 242927 [details]
Updated VPD patch

This patch removes a few superfluous tests for allocations that cannot fail and for the size of a resource value (which cannot be below 0, will be <= 255 for all resources, but might theoretically be up to 65535 in case of the ident string - which in practice will be much smaller than 100 bytes, though).
Comment 6 Stefan Eßer freebsd_committer freebsd_triage 2023-06-21 16:53:51 UTC
Created attachment 242928 [details]
Updated VPD patch

Fix style ...
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-06-21 18:35:55 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=586164cc095105f82ffd87997c3aa78f5e21a90c

commit 586164cc095105f82ffd87997c3aa78f5e21a90c
Author:     Stefan Eßer <se@FreeBSD.org>
AuthorDate: 2023-06-21 17:36:39 +0000
Commit:     Stefan Eßer <se@FreeBSD.org>
CommitDate: 2023-06-21 17:36:39 +0000

    dev/pci: simplify PCI VPD access functions

    This update contains a rewrite of the VPD parser based on the
    definition of the structure of the VPD data (ident, R/O resource
    data, optional R/W data, end tag).

    The parser it replaces was based on a state machine, with the tags
    and the parsed data controlling the state changes. The flexibility
    of this parser is actually not required, and it has caused kernel
    panics when operating on malformed data.

    Analysis of the VPD code to make it more robust lead me to believe
    that it was easier to write a "strict" parser than to restrict the
    flexible state machine to detect and reject non-well-formed data.
    A number of restrictions had already been added, but they make the
    state machine ever more complex and harder to understand.

    This updated parser has been verified to return identical parsed data
    as the current implementation for the example VPD data given in the
    PCI standard and in some actual PCIe VPD data.

    It is strict in the sense that it detects and rejects any deviation
    from a well-formed VPD structure.

    PR:             272018
    Approved by:    kib
    MFC after:      4 weeks
    Differential Revision:  https://reviews.freebsd.org/D34268

 sys/dev/pci/pci.c | 551 ++++++++++++++++++++++++++----------------------------
 1 file changed, 263 insertions(+), 288 deletions(-)