Summary: | [amr] [patch] amr_ioctl(): call of malloc() causes memory corruption and panic | ||
---|---|---|---|
Product: | Base System | Reporter: | longwitz |
Component: | kern | Assignee: | John Baldwin <jhb> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | ||
Priority: | Normal | ||
Version: | 8.2-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
longwitz
2011-03-18 11:50:08 UTC
As John Baldwin pointed out to me, this problem is caused by overrunnig the memory buffer by a specific megarc command. Further analysis showed that indeed this happens. There are several commands leading megarc to acquire a buffer of 12868 bytes but in this case the controller always sends back 25412 bytes. That is the memory corruption. This problem can be patched in megarc, see ports/137938. But I have also seen seen sporadic situations like this: megarc wants a buffer of 36 bytes, but the controller sends 1k data back. Therefore I think it is the best to be on the safe side and allocate always a buffer big enough for all answers of the controller. The following patch now looks good for me: --- amr.c.orig 2010-02-11 19:34:06.000000000 +0100 +++ amr.c 2011-09-09 16:23:07.000000000 +0200 @@ -87,6 +87,7 @@ #include <dev/amr/amrvar.h> #define AMR_DEFINE_TABLES #include <dev/amr/amr_tables.h> +#define MAX_AMR_IOCTL 25600 /* observed: 25412 */ SYSCTL_NODE(_hw, OID_AUTO, amr, CTLFLAG_RD, 0, "AMR driver parameters"); @@ -843,7 +844,7 @@ /* handle inbound data buffer */ if (au_length != 0 && au_cmd[0] != 0x06) { - if ((dp = malloc(au_length, M_AMR, M_WAITOK|M_ZERO)) == NULL) { + if ((dp = malloc(MAX_AMR_IOCTL, M_AMR, M_WAITOK|M_ZERO)) == NULL) { error = ENOMEM; goto out; } -- Dr. Andreas Longwitz Data Service GmbH Beethovenstr. 2A 23617 Stockelsdorf Amtsgericht Lübeck, HRB 318 BS Geschäftsführer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau It is much better to modify the malloc in the patch in this way: malloc(max(au_length, MAX_AMR_IOCTL), M_AMR, M_WAITOK|M_ZERO)) Thanks to Adran Chadd for this hint. -- Dr. Andreas Longwitz Data Service GmbH Beethovenstr. 2A 23617 Stockelsdorf Amtsgericht Lübeck, HRB 318 BS Geschäftsführer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau Scott, what do you think of this? Do you know offhand if there is a known maximum reply size for a message from an amr(4) controller? (That is, a better value for MAX_AMR_IOCTL in the proposed patch) -- John Baldwin Is the problem that the buffer in allocated to be too big, or too small? The bug report talks about a 36 byte buffer being requested from megarc, = but the firmware returning 1k instead. If this is the bug that I'm = thinking, then it was fixed in the linux ioctl path a few years ago. I = had hoped that megarc would never be compiled and run natively on = FreeBSD, so I didn't fix it in the native path. The fix is to always allocate a minimum of 4k and attach it to the = command sent to the card, but only copy back to user land the length = that the ioctl requests. This is what linux does. It's a bug in the = megaraid firmware, and it's a bug in megarc, and I had hoped that LSI = would be better than this. Note that the linux ioctl path in the driver = that I mentioned only does this as a special case when the length is 0. = It sounds like it needs to be done for all cases of min(au_lenth, 4096). I'm not sure why there's a discussion on max length; if I'm not = understanding the problem, please let me know. Did a bit of history review. When an ioctl goes through the linux = driver, it goes through a pool allocator to get a scratch buffer for the = command. The pool sizes are PAGE_SIZE (4K) and power-of-2 multiples = above that, up to 64k. So when an application requests a buffer size of = 0-4k, a 4k buffer gets sent to the card. A ~25k buffer request actually = gets 32k sent to the card. The firmware seems to exploit this and = basically ignores the specified length, happily filling the provided = buffer with whatever power-of-2 data count it feels like. According to = an old conversation with Dell and LSI, the LSI engineers consider this = to be a necessary feature, not a bug. As I mentioned previously, I fixed this as a special case of len=3D=3D0 = in the linux ioctl emulation path. Based on the bug report here, the = fix needs to be expanded to rounding up to power-of-2 sizes in both the = native and emulated paths. Anyone who wants to tackle this is welcome to do so. The existing hack = also suffers from bug in the form of it copying the 4k buffer back to = user land, even though userland specified a length of 0. This is = probably a bad thing, and it's something that linux appears to handle = correctly.= Ok, I've adapted that into a patch at http://www.freebsd.org/~jhb/patches/amr_buffer_len.patch Andreas, can you test that updated patch to verify it fixes your issue? -- John Baldwin John, I did several tests with your patch in 8.2 and everything works fine, if I use the binary version of megarc with the patch included described in ports/137938. The original megarc sends amr_ioctl's with length 12868 (e.g. the first ioctl of the command "megarc -ctlrinfo -a0") and your patch calls the controller with real_length=16384, but the controller returns 25412 Bytes. This happens all the time on nearly every megarc command, I think this is a program error in megarc, he uses user_cmd=0xa104 with buffer length 12868, but the firmware of the controller replies with 25412 bytes. So we have memory corruption of 25412 - 16384 = 9026 bytes. The patch in ports/137938 changes the lenght field in megarc from 12868 to 25412 to avoid this problem. A line like if( len == 12868 ) len = 25412; would solve this problem in the driver. I did not find any other static problems of this type. Another story are dynamic problems. When the controller is very busy, I see sometimes 1KB bytes returned from the controller, when lenght is much lower. This problem is handled by your patch in all cases. Andreas Longwitz On Thursday, April 19, 2012 4:12:50 pm Andreas Longwitz wrote:
> John,
> I did several tests with your patch in 8.2 and everything works fine, if
> I use the binary version of megarc with the patch included described in
> ports/137938.
>
> The original megarc sends amr_ioctl's with length 12868 (e.g. the first
> ioctl of the command "megarc -ctlrinfo -a0") and your patch calls the
> controller with real_length=16384, but the controller returns 25412
> Bytes. This happens all the time on nearly every megarc command, I think
> this is a program error in megarc, he uses user_cmd=0xa104 with buffer
> length 12868, but the firmware of the controller replies with 25412
> bytes. So we have memory corruption of 25412 - 16384 = 9026 bytes. The
> patch in ports/137938 changes the lenght field in megarc from 12868 to
> 25412 to avoid this problem. A line like
> if( len == 12868 ) len = 25412;
> would solve this problem in the driver. I did not find any other static
> problems of this type.
>
> Another story are dynamic problems. When the controller is very busy, I
> see sometimes 1KB bytes returned from the controller, when lenght is
> much lower. This problem is handled by your patch in all cases.
Hmm, given the above, I'm tempted to just force the buffer to always be at
least 32k. Scott, what do you think about that?
--
John Baldwin
On Thursday, April 19, 2012 4:12:50 pm Andreas Longwitz wrote: > John, > I did several tests with your patch in 8.2 and everything works fine, if > I use the binary version of megarc with the patch included described in > ports/137938. > > The original megarc sends amr_ioctl's with length 12868 (e.g. the first > ioctl of the command "megarc -ctlrinfo -a0") and your patch calls the > controller with real_length=16384, but the controller returns 25412 > Bytes. This happens all the time on nearly every megarc command, I think > this is a program error in megarc, he uses user_cmd=0xa104 with buffer > length 12868, but the firmware of the controller replies with 25412 > bytes. So we have memory corruption of 25412 - 16384 = 9026 bytes. The > patch in ports/137938 changes the lenght field in megarc from 12868 to > 25412 to avoid this problem. A line like > if( len == 12868 ) len = 25412; > would solve this problem in the driver. I did not find any other static > problems of this type. > > Another story are dynamic problems. When the controller is very busy, I > see sometimes 1KB bytes returned from the controller, when lenght is > much lower. This problem is handled by your patch in all cases. > > Andreas Longwitz Ah, ok. I think we should just make the minimum buffer size 32k which should workaround this. I've updated the patch at the same URL: http://www.freebsd.org/~jhb/patches/amr_buffer_len.patch -- John Baldwin Worst firmware ever. Seriously. If this was open bad, the driver would be r= emoved and the hardware blacklisted as a security threat. Scott On Apr 20, 2012, at 6:13 AM, John Baldwin <jhb@freebsd.org> wrote: > On Thursday, April 19, 2012 4:12:50 pm Andreas Longwitz wrote: >> John, >> I did several tests with your patch in 8.2 and everything works fine, if >> I use the binary version of megarc with the patch included described in >> ports/137938. >>=20 >> The original megarc sends amr_ioctl's with length 12868 (e.g. the first >> ioctl of the command "megarc -ctlrinfo -a0") and your patch calls the >> controller with real_length=3D16384, but the controller returns 25412 >> Bytes. This happens all the time on nearly every megarc command, I think >> this is a program error in megarc, he uses user_cmd=3D0xa104 with buffer >> length 12868, but the firmware of the controller replies with 25412 >> bytes. So we have memory corruption of 25412 - 16384 =3D 9026 bytes. The >> patch in ports/137938 changes the lenght field in megarc from 12868 to >> 25412 to avoid this problem. A line like >> if( len =3D=3D 12868 ) len =3D 25412; >> would solve this problem in the driver. I did not find any other static >> problems of this type. >>=20 >> Another story are dynamic problems. When the controller is very busy, I >> see sometimes 1KB bytes returned from the controller, when lenght is >> much lower. This problem is handled by your patch in all cases. >>=20 >> Andreas Longwitz >=20 > Ah, ok. I think we should just make the minimum buffer size 32k which > should workaround this. I've updated the patch at the same URL: >=20 > http://www.freebsd.org/~jhb/patches/amr_buffer_len.patch >=20 > --=20 > John Baldwin "openbsd". <i>When computers attack!<i> On Apr 20, 2012, at 7:30 AM, Scott Long <scott4long@yahoo.com> wrote: > Worst firmware ever. Seriously. If this was open bad, the driver would b= e removed and the hardware blacklisted as a security threat. >=20 > Scott >=20 > On Apr 20, 2012, at 6:13 AM, John Baldwin <jhb@freebsd.org> wrote: >=20 >> On Thursday, April 19, 2012 4:12:50 pm Andreas Longwitz wrote: >>> John, >>> I did several tests with your patch in 8.2 and everything works fine, if= >>> I use the binary version of megarc with the patch included described in >>> ports/137938. >>>=20 >>> The original megarc sends amr_ioctl's with length 12868 (e.g. the first >>> ioctl of the command "megarc -ctlrinfo -a0") and your patch calls the >>> controller with real_length=3D16384, but the controller returns 25412 >>> Bytes. This happens all the time on nearly every megarc command, I think= >>> this is a program error in megarc, he uses user_cmd=3D0xa104 with buffer= >>> length 12868, but the firmware of the controller replies with 25412 >>> bytes. So we have memory corruption of 25412 - 16384 =3D 9026 bytes. The= >>> patch in ports/137938 changes the lenght field in megarc from 12868 to >>> 25412 to avoid this problem. A line like >>> if( len =3D=3D 12868 ) len =3D 25412; >>> would solve this problem in the driver. I did not find any other static >>> problems of this type. >>>=20 >>> Another story are dynamic problems. When the controller is very busy, I >>> see sometimes 1KB bytes returned from the controller, when lenght is >>> much lower. This problem is handled by your patch in all cases. >>>=20 >>> Andreas Longwitz >>=20 >> Ah, ok. I think we should just make the minimum buffer size 32k which >> should workaround this. I've updated the patch at the same URL: >>=20 >> http://www.freebsd.org/~jhb/patches/amr_buffer_len.patch >>=20 >> --=20 >> John Baldwin > Worst firmware ever. Seriously. Agree, but I run three dozen servers with amr controllers of four different types and all behave equal. Without source and/or docs I can not decide what is firmware and/or megarc error. I must live with this situation and I am happy, that this is the only corner in my FreeBSD world that is not "open". >> Ah, ok. I think we should just make the minimum buffer size 32k which >> should workaround this. I've updated the patch at the same URL: >> >> http://www.freebsd.org/~jhb/patches/amr_buffer_len.patch >> >> -- >> John Baldwin I have tested the updated patch, for me it is ok. Thanks you all for investigating in this problem. Andreas Longwitz State Changed From-To: open->patched Fix committed to HEAD. Responsible Changed From-To: freebsd-bugs->jhb Fix committed to HEAD. Author: jhb Date: Fri Apr 20 20:27:31 2012 New Revision: 234501 URL: http://svn.freebsd.org/changeset/base/234501 Log: The amr(4) firmware contains a rather dubious "feature" where it assumes for small buffers (< 64k) that the OS driver is actually using a buffer rounded up to the next power of 2. It also assumes that the buffer is at least 4k in size. Furthermore, there is at least one known instance of megarc sending a request with a 12k buffer where the firmware writes out a 24k-ish reply. To workaround the data corruption triggered by this "feature", ensure that buffers for user commands use a minimum size of 32k, and that buffers between 32k and 64k use a 64k buffer. PR: kern/155658 Submitted by: Andreas Longwitz longwitz incore de Reviewed by: scottl MFC after: 1 week Modified: head/sys/dev/amr/amr.c Modified: head/sys/dev/amr/amr.c ============================================================================== --- head/sys/dev/amr/amr.c Fri Apr 20 16:56:14 2012 (r234500) +++ head/sys/dev/amr/amr.c Fri Apr 20 20:27:31 2012 (r234501) @@ -535,6 +535,31 @@ shutdown_out: amr_startup(sc); } +/* + * Bug-for-bug compatibility with Linux! + * Some apps will send commands with inlen and outlen set to 0, + * even though they expect data to be transfered to them from the + * card. Linux accidentally allows this by allocating a 4KB + * buffer for the transfer anyways, but it then throws it away + * without copying it back to the app. + * + * The amr(4) firmware relies on this feature. In fact, it assumes + * the buffer is always a power of 2 up to a max of 64k. There is + * also at least one case where it assumes a buffer less than 16k is + * greater than 16k. Force a minimum buffer size of 32k and round + * sizes between 32k and 64k up to 64k as a workaround. + */ +static unsigned long +amr_ioctl_buffer_length(unsigned long len) +{ + + if (len <= 32 * 1024) + return (32 * 1024); + if (len <= 64 * 1024) + return (64 * 1024); + return (len); +} + int amr_linux_ioctl_int(struct cdev *dev, u_long cmd, caddr_t addr, int32_t flag, struct thread *td) @@ -664,16 +689,7 @@ amr_linux_ioctl_int(struct cdev *dev, u_ error = ENOIOCTL; break; } else { - /* - * Bug-for-bug compatibility with Linux! - * Some apps will send commands with inlen and outlen set to 0, - * even though they expect data to be transfered to them from the - * card. Linux accidentally allows this by allocating a 4KB - * buffer for the transfer anyways, but it then throws it away - * without copying it back to the app. - */ - if (!len) - len = 4096; + len = amr_ioctl_buffer_length(imax(ali.inlen, ali.outlen)); dp = malloc(len, M_AMR, M_WAITOK | M_ZERO); @@ -703,7 +719,7 @@ amr_linux_ioctl_int(struct cdev *dev, u_ status = ac->ac_status; error = copyout(&status, &((struct amr_mailbox *)&((struct amr_linux_ioctl *)addr)->mbox[0])->mb_status, sizeof(status)); if (ali.outlen) { - error = copyout(dp, (void *)(uintptr_t)mb->mb_physaddr, len); + error = copyout(dp, (void *)(uintptr_t)mb->mb_physaddr, ali.outlen); if (error) break; } @@ -750,7 +766,7 @@ amr_ioctl(struct cdev *dev, u_long cmd, struct amr_command *ac; struct amr_mailbox_ioctl *mbi; void *dp, *au_buffer; - unsigned long au_length; + unsigned long au_length, real_length; unsigned char *au_cmd; int *au_statusp, au_direction; int error; @@ -842,8 +858,9 @@ amr_ioctl(struct cdev *dev, u_long cmd, } /* handle inbound data buffer */ + real_length = amr_ioctl_buffer_length(au_length); if (au_length != 0 && au_cmd[0] != 0x06) { - if ((dp = malloc(au_length, M_AMR, M_WAITOK|M_ZERO)) == NULL) { + if ((dp = malloc(real_length, M_AMR, M_WAITOK|M_ZERO)) == NULL) { error = ENOMEM; goto out; } @@ -902,7 +919,7 @@ amr_ioctl(struct cdev *dev, u_long cmd, /* build the command */ ac->ac_data = dp; - ac->ac_length = au_length; + ac->ac_length = real_length; ac->ac_flags |= AMR_CMD_DATAIN|AMR_CMD_DATAOUT; /* run the command */ _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" State Changed From-To: patched->closed Fix merged to 8 and 9. The last update of amr.c (SVN rev 236319) needs a correction, because the controller should see the length of the buffer of the userland program: @@ -919,7 +919,7 @@ /* build the command */ ac->ac_data = dp; - ac->ac_length = real_length; + ac->ac_length = au_length; ac->ac_flags |= AMR_CMD_DATAIN|AMR_CMD_DATAOUT; /* run the command */ Without this patch some busy controllers only return zeros on every amr_ioctl. To do what is described in the comment "Bug-for-bug compatibility .." we should allocate a buffer in the case of userland length 0 too: @ -859,11 +859,11 @@ /* handle inbound data buffer */ real_length = amr_ioctl_buffer_length(au_length); + if ((dp = malloc(real_length, M_AMR, M_WAITOK|M_ZERO)) == NULL) { + error = ENOMEM; + goto out; + } if (au_length != 0 && au_cmd[0] != 0x06) { - if ((dp = malloc(real_length, M_AMR, M_WAITOK|M_ZERO)) == NULL) { - error = ENOMEM; - goto out; - } if ((error = copyin(au_buffer, dp, au_length)) != 0) { free(dp, M_AMR); return (error); Andreas Longwitz Author: jhb Date: Wed Sep 19 11:54:32 2012 New Revision: 240692 URL: http://svn.freebsd.org/changeset/base/240692 Log: As a followup to r234501, ensure that the native ioctl path always allocates a 4kb buffer if a request uses a buffer size of 0. (The Linux ioctl path already did this.) PR: kern/155658 Submitted by: Andreas Longwitz MFC after: 1 week Modified: head/sys/dev/amr/amr.c Modified: head/sys/dev/amr/amr.c ============================================================================== --- head/sys/dev/amr/amr.c Wed Sep 19 11:40:17 2012 (r240691) +++ head/sys/dev/amr/amr.c Wed Sep 19 11:54:32 2012 (r240692) @@ -846,11 +846,8 @@ amr_ioctl(struct cdev *dev, u_long cmd, /* handle inbound data buffer */ real_length = amr_ioctl_buffer_length(au_length); + dp = malloc(real_length, M_AMR, M_WAITOK|M_ZERO); if (au_length != 0 && au_cmd[0] != 0x06) { - if ((dp = malloc(real_length, M_AMR, M_WAITOK|M_ZERO)) == NULL) { - error = ENOMEM; - goto out; - } if ((error = copyin(au_buffer, dp, au_length)) != 0) { free(dp, M_AMR); return (error); @@ -920,8 +917,7 @@ amr_ioctl(struct cdev *dev, u_long cmd, error = copyout(dp, au_buffer, au_length); } debug(2, "copyout %ld bytes from %p -> %p", au_length, dp, au_buffer); - if (dp != NULL) - debug(2, "%p status 0x%x", dp, ac->ac_status); + debug(2, "%p status 0x%x", dp, ac->ac_status); *au_statusp = ac->ac_status; out: _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" On Wednesday, September 19, 2012 6:13:33 am Andreas Longwitz wrote: > The last update of amr.c (SVN rev 236319) needs a correction, because > the controller should see the length of the buffer of the userland program: > > @@ -919,7 +919,7 @@ > > /* build the command */ > ac->ac_data = dp; > - ac->ac_length = real_length; > + ac->ac_length = au_length; > ac->ac_flags |= AMR_CMD_DATAIN|AMR_CMD_DATAOUT; > > /* run the command */ > > Without this patch some busy controllers only return zeros on every > amr_ioctl. Hmm. We pass in the real_length in the Linux ioctl path as well. If we want to fix that we should fix it in both places. Also, I don't see how this can work properly for a size of zero. bus_dma won't populate the S/G list in that case and you are back to the original buggy case for a size of 0. Are you sure that you need this patch and not just the fix for commands with a size of zero? > To do what is described in the comment "Bug-for-bug compatibility .." > we should allocate a buffer in the case of userland length 0 too: I've committed a slightly modified version of this (no need to check for NULL from a M_WAITOK malloc(), they always succeed). -- John Baldwin Hi, >> The last update of amr.c (SVN rev 236319) needs a correction, because >> the controller should see the length of the buffer of the userland program: >> >> @@ -919,7 +919,7 @@ >> >> /* build the command */ >> ac->ac_data = dp; >> - ac->ac_length = real_length; >> + ac->ac_length = au_length; >> ac->ac_flags |= AMR_CMD_DATAIN|AMR_CMD_DATAOUT; >> >> /* run the command */ >> >> Without this patch some busy controllers only return zeros on every >> amr_ioctl. > > Hmm. We pass in the real_length in the Linux ioctl path as well. If we want > to fix that we should fix it in both places. Yes, thats true. I would prefer to set ac->ac_length always to the length the user program gives us, it was done in this way until 1.90.2.4. The real_length should be used only for the malloc to prevent the panic when the controller does not respect the given length. > Also, I don't see how this can > work properly for a size of zero. bus_dma won't populate the S/G list in that > case and you are back to the original buggy case for a size of 0. There are amr_ioctl() calls from megarc with au_length=0, but the controller sends back 148 bytes to the buffer dp, which was allocated in real_length. I have checked this with a special ioctl_hook > Are you sure that you need this patch and not just the fix for commands with a size of > zero? Yes. After introducing your patch on my servers I found that on some of them megarc and amrstat did not work correct anymore. Sometime for some hours the amr_ioctl did not give any output. I tried a simple amrstat (au_length=1024, therefore real_length=32K) every 5 minutes on 10 servers. The result was very confused. Some server ok, some never ok, some for four hours ok, then not, ... Going back to 1.90.2.4 - with an updated malloc( max(au_length, 25600)) - lets the problem disappear. After all Dr. Rasch pointed out to me, that the only difference for the interface to the controller of your patch is the value of ac->ac_length. After changing ac->ac_length from real_length (32K on every call) to au_length (for most calls of megarc only some bytes, 1K for amrstat) everything works fine now on all my servers for more than a week. I think the controller is overstrained with a lot of 32k requests. >> To do what is described in the comment "Bug-for-bug compatibility .." >> we should allocate a buffer in the case of userland length 0 too: > > I've committed a slightly modified version of this (no need to check for > NULL from a M_WAITOK malloc(), they always succeed). Thats ok for me, thanks! Andreas Longwitz On Wednesday, September 19, 2012 12:25:00 pm Andreas Longwitz wrote: > Hi, > > >> The last update of amr.c (SVN rev 236319) needs a correction, because > >> the controller should see the length of the buffer of the userland program: > >> > >> @@ -919,7 +919,7 @@ > >> > >> /* build the command */ > >> ac->ac_data = dp; > >> - ac->ac_length = real_length; > >> + ac->ac_length = au_length; > >> ac->ac_flags |= AMR_CMD_DATAIN|AMR_CMD_DATAOUT; > >> > >> /* run the command */ > >> > >> Without this patch some busy controllers only return zeros on every > >> amr_ioctl. > > > > Hmm. We pass in the real_length in the Linux ioctl path as well. If we want > > to fix that we should fix it in both places. > Yes, thats true. I would prefer to set ac->ac_length always to the > length the user program gives us, it was done in this way until > 1.90.2.4. The real_length should be used only for the malloc to prevent > the panic when the controller does not respect the given length. Except that this can't really work. The controller wants a buffer somewhere that it can DMA to. What addresses do you expect it to use for its larger buffer if we don't give it an address? One option is to use contigmalloc() instead of malloc() to allocate a contiguous buffer perhaps if the adapter just assumes that the extra bytes are physically contiguous after the last S/G entry. However, that still won't work for the 0 length case. In the 0 length case the bus_dma code won't populate the S/G list at all, so the adapter will just scribble over random memory when it dumps something out. That takes us right back to the original bug this was trying to fix. > > Also, I don't see how this can > > work properly for a size of zero. bus_dma won't populate the S/G list in that > > case and you are back to the original buggy case for a size of 0. > > There are amr_ioctl() calls from megarc with au_length=0, but the > controller sends back 148 bytes to the buffer dp, which was allocated in > real_length. I have checked this with a special ioctl_hook Right, but with your change we won't actually send the address of that buffer to the adapter. > > Are you sure that you need this patch and not just the fix for commands with a size of > > zero? > > Yes. After introducing your patch on my servers I found that on some of > them megarc and amrstat did not work correct anymore. Sometime for some > hours the amr_ioctl did not give any output. I tried a simple amrstat > (au_length=1024, therefore real_length=32K) every 5 minutes on 10 > servers. The result was very confused. Some server ok, some never ok, > some for four hours ok, then not, ... Going back to 1.90.2.4 - with an > updated malloc( max(au_length, 25600)) - lets the problem disappear. > After all Dr. Rasch pointed out to me, that the only difference for the > interface to the controller of your patch is the value of ac->ac_length. > After changing ac->ac_length from real_length (32K on every call) to > au_length (for most calls of megarc only some bytes, 1K for amrstat) > everything works fine now on all my servers for more than a week. I > think the controller is overstrained with a lot of 32k requests. I think the correct fix is to adjust the logic that sets real_length, not to undo the previous change (which reverting to real_length will do). If you don't send the updated length to the adapter you have no control over where it writes its overflow data into memory. I do think it might be true that if it is trying to read in 32k that might be a problem, but I think your solution just brings back the old bug. I looked at the history of this PR, and I think we should just refine the limit logic to go back to rounding up to a power of 2, but handle your special case of a 12k buffer that gets a returned size larger than 16k by rounding requests between 8k and 16k up to 32k rather than 16k. Try this: Index: amr.c =================================================================== --- amr.c (revision 240692) +++ amr.c (working copy) @@ -533,13 +533,19 @@ shutdown_out: * The amr(4) firmware relies on this feature. In fact, it assumes * the buffer is always a power of 2 up to a max of 64k. There is * also at least one case where it assumes a buffer less than 16k is - * greater than 16k. Force a minimum buffer size of 32k and round - * sizes between 32k and 64k up to 64k as a workaround. + * greater than 16k. However, forcing all buffers to a size of 32k + * causes stalls in the firmware. Force each command smaller than + * 64k up to the next power of two except that commands between 8k + * and 16k are rounded up to 32k instead of 16k. */ static unsigned long amr_ioctl_buffer_length(unsigned long len) { + if (len <= 4 * 1024) + return (4 * 1024); + if (len <= 8 * 1024) + return (8 * 1024); if (len <= 32 * 1024) return (32 * 1024); if (len <= 64 * 1024) -- John Baldwin >> I would prefer to set ac->ac_length always to the >> length the user program gives us, it was done in this way until >> 1.90.2.4. The real_length should be used only for the malloc to prevent >> the panic when the controller does not respect the given length. > > Except that this can't really work. The controller wants a buffer somewhere that > it can DMA to. What addresses do you expect it to use for its larger buffer if > we don't give it an address? One option is to use contigmalloc() instead of malloc() > to allocate a contiguous buffer perhaps if the adapter just assumes that the extra > bytes are physically contiguous after the last S/G entry. Ok, thanks for your explanation. So it seems to me, DMA did work for me with my first max(au_lenght, 25600) construct because this is in practice always equal to 25600 < MAXBSIZE (=64k). > However, that still won't work for the 0 length case. In the 0 length case the > bus_dma code won't populate the S/G list at all, so the adapter will just scribble > over random memory when it dumps something out. That takes us right back to the > original bug this was trying to fix. Ok, I now have understand better the DMA stuff in amr and I am wondering how this could worked for length 0 before. > I think the correct fix is to adjust the logic that sets real_length, not > to undo the previous change (which reverting to real_length will do). > > If you don't send the updated length to the adapter you have no control > over where it writes its overflow data into memory. I do think it might > be true that if it is trying to read in 32k that might be a problem, but > I think your solution just brings back the old bug. > > I looked at the history of this PR, and I think we should just refine the > limit logic to go back to rounding up to a power of 2, but handle your > special case of a 12k buffer that gets a returned size larger than 16k > by rounding requests between 8k and 16k up to 32k rather than 16k. Try > this: > > Index: amr.c > =================================================================== > --- amr.c (revision 240692) > +++ amr.c (working copy) > @@ -533,13 +533,19 @@ shutdown_out: > * The amr(4) firmware relies on this feature. In fact, it assumes > * the buffer is always a power of 2 up to a max of 64k. There is > * also at least one case where it assumes a buffer less than 16k is > - * greater than 16k. Force a minimum buffer size of 32k and round > - * sizes between 32k and 64k up to 64k as a workaround. > + * greater than 16k. However, forcing all buffers to a size of 32k > + * causes stalls in the firmware. Force each command smaller than > + * 64k up to the next power of two except that commands between 8k > + * and 16k are rounded up to 32k instead of 16k. > */ > static unsigned long > amr_ioctl_buffer_length(unsigned long len) > { > > + if (len <= 4 * 1024) > + return (4 * 1024); > + if (len <= 8 * 1024) > + return (8 * 1024); > if (len <= 32 * 1024) > return (32 * 1024); > if (len <= 64 * 1024) > I have introduced this patch on some servers and will report the result. -- Andreas Longwitz >
> Index: amr.c
> ===================================================================
> --- amr.c (revision 240692)
> +++ amr.c (working copy)
> @@ -533,13 +533,19 @@ shutdown_out:
> * The amr(4) firmware relies on this feature. In fact, it assumes
> * the buffer is always a power of 2 up to a max of 64k. There is
> * also at least one case where it assumes a buffer less than 16k is
> - * greater than 16k. Force a minimum buffer size of 32k and round
> - * sizes between 32k and 64k up to 64k as a workaround.
> + * greater than 16k. However, forcing all buffers to a size of 32k
> + * causes stalls in the firmware. Force each command smaller than
> + * 64k up to the next power of two except that commands between 8k
> + * and 16k are rounded up to 32k instead of 16k.
> */
> static unsigned long
> amr_ioctl_buffer_length(unsigned long len)
> {
>
> + if (len <= 4 * 1024)
> + return (4 * 1024);
> + if (len <= 8 * 1024)
> + return (8 * 1024);
> if (len <= 32 * 1024)
> return (32 * 1024);
> if (len <= 64 * 1024)
>
I have tested this patch for 10 days on 10 different server, no error
occurred. Patch works !
Andreas Longwitz
Author: jhb Date: Fri Oct 5 15:52:31 2012 New Revision: 241228 URL: http://svn.freebsd.org/changeset/base/241228 Log: Further adjust the workaround in r234501. Rounding all small requests up to 32k swamped the controller causing firmware hangs. Instead, round requests smaller than 64k up to the next power of 2 as a general rule. To handle the one known special case of a command that accepts a 12k buffer returning a 24k-ish reply, round requests between 8k and 16k up to 32k rather than 16k. The result is that commands less than 8k should now be rounded up to a smaller size (either 4k or 8k) rather than 32k. PR: kern/155658 Tested by: Andreas Longwitz MFC after: 1 week Modified: head/sys/dev/amr/amr.c Modified: head/sys/dev/amr/amr.c ============================================================================== --- head/sys/dev/amr/amr.c Fri Oct 5 15:36:30 2012 (r241227) +++ head/sys/dev/amr/amr.c Fri Oct 5 15:52:31 2012 (r241228) @@ -533,13 +533,19 @@ shutdown_out: * The amr(4) firmware relies on this feature. In fact, it assumes * the buffer is always a power of 2 up to a max of 64k. There is * also at least one case where it assumes a buffer less than 16k is - * greater than 16k. Force a minimum buffer size of 32k and round - * sizes between 32k and 64k up to 64k as a workaround. + * greater than 16k. However, forcing all buffers to a size of 32k + * causes stalls in the firmware. Force each command smaller than + * 64k up to the next power of two except that commands between 8k + * and 16k are rounded up to 32k instead of 16k. */ static unsigned long amr_ioctl_buffer_length(unsigned long len) { + if (len <= 4 * 1024) + return (4 * 1024); + if (len <= 8 * 1024) + return (8 * 1024); if (len <= 32 * 1024) return (32 * 1024); if (len <= 64 * 1024) _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" |