Bug 222066 - mpt crash in virtualbox
Summary: mpt crash in virtualbox
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: vbox
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-05 12:28 UTC by Andriy Gapon
Modified: 2018-06-07 09:55 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Gapon freebsd_committer 2017-09-05 12:28:43 UTC
We are seeing a kernel crash in mpt driver while running FreeBSD as a guest in Virtualbox.

The problem seems to be caused by Virtualbox setting the request frame size parameter to 512 bytes:

pReply->IOCFacts.u16RequestFrameSize  = 128;    /* @todo Figure out where it is needed. */

The driver does not seem to be able to cope with such a large frame size.
It could be argued that the bug is on the Virtualbox side.  I am not sure if it really needs such a size (especially, given the comment).

Anyway, it could be useful to be able to handle that value in mpt.

A bit of code analysis follows.
In the code we have the following important definitions:

/* MPT_RQSL- size of request frame, in bytes */
#define MPT_RQSL(mpt)           (mpt->ioc_facts.RequestFrameSize << 2)

#define MPT_MAX_REQUESTS(mpt)   512
#define MPT_REQUEST_AREA        512
#define MPT_SENSE_SIZE          32      /* included in MPT_REQUEST_AREA */
#define MPT_REQ_MEM_SIZE(mpt)   (MPT_MAX_REQUESTS(mpt) * MPT_REQUEST_AREA)

So, the code allocates 512 request buffers of 512 bytes each as a single contiguous (both physically and virtually) buffer suitable for DMA between the driver and the hardware (see mpt_dma_buf_alloc).

When the crash happens, it's a page fault here:
memcpy <= mpt_read_cfg_page <= mpt_action

The problematic request:
(kgdb) p *req
$1 = {links = {tqe_next = 0xfffffe000179c390, tqe_prev = 0xfffffe0001798438}, state = 10, index = 511, IOCStatus = 0, ResponseCode = 0,
  serno = 37161, ccb = 0x0, req_vbuf = 0xfffffe0000286e00, sense_vbuf = 0xfffffe0000286fe0, req_pbuf = 2089315840, sense_pbuf = 2089316320,
  dmap = 0x0, chain = 0x0, callout = {c_links = {le = {le_next = 0x0, le_prev = 0xfffffe0001522df8}, sle = {sle_next = 0x0}, tqe = {
        tqe_next = 0x0, tqe_prev = 0xfffffe0001522df8}}, c_time = 3787081006447, c_precision = 1342177187, c_arg = 0xfffff80038870000,
    c_func = 0xffffffff804a4c30 <mpt_timeout>, c_lock = 0xfffffe0001798008, c_flags = 0, c_iflags = 0, c_cpu = 0}}


We see that index is 511, so this is the last request object with its buffer in the last 512 bytes of the contiguous buffer.
We see that the page fault happens right beyond the allocated buffer region.

So, my interpretation is that RequestFrameSize that's reported by the [emulated] hardware is too large to be handled by the hardcoded request buffer size.  The problem is masked for all buffers but the last, because the hardware would simply overwrite the next request buffer and the driver would read from it.  So, no page fault although there is a chance of silent data corruption.
For the last buffer there is obviously no next buffer and we get the page fault.

Conclusions:
- first of all, the driver should check MPT_RQSL against MPT_REQUEST_AREA and refuse to attach if the request frame size is too large
- we can consider bumping MPT_REQUEST_AREA to, e.g., 1024... probably better to check
- Linux driver seems to cap the request size at 128 bytes:
  http://elixir.free-electrons.com/linux/latest/source/drivers/message/fusion/mptbase.c#L3214