Summary: | [patch] [request] bzip2(1): SSE2 optimization for bzip2/libbz2 | ||
---|---|---|---|
Product: | Base System | Reporter: | Mikhail T. <freebsd-2024> |
Component: | bin | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Open --- | ||
Severity: | Affects Only Me | Keywords: | patch |
Priority: | Normal | ||
Version: | 6.2-PRERELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
Mikhail T.
2006-12-14 21:50:06 UTC
--Boundary-00=_dr1nFvPs1zvnDlW Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline According to Valgrind on amd64-linux, this patch causes bzip2 to do comparisons based on uninitialised memory when compressing, for the attached file (PLIST). ==16140== Conditional jump or move depends on uninitialised value(s) ==16140== at 0x414ECF: mainGtU (blocksort.c:538) ==16140== by 0x414B08: mainSimpleSort (blocksort.c:690) ==16140== by 0x4150E1: mainQSort3 (blocksort.c:805) ==16140== by 0x415FAD: mainSort (blocksort.c:1063) ==16140== by 0x4163F8: BZ2_blockSort (blocksort.c:1244) ==16140== by 0x40DE46: BZ2_compressBlock (compress.c:660) ==16140== by 0x4054B6: handle_compress (bzlib.c:431) ==16140== by 0x40571A: BZ2_bzCompress (bzlib.c:501) ==16140== by 0x407B0D: BZ2_bzWriteClose64 (bzlib.c:1093) ==16140== by 0x4014E1: compressStream (bzip2.c:451) ==16140== by 0x402E20: compress (bzip2.c:1368) ==16140== by 0x40455C: main (bzip2.c:2034) --Boundary-00=_dr1nFvPs1zvnDlW Content-Type: text/plain; charset="us-ascii"; name="PLIST" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="PLIST" share/sgml/docbook/dsssl/modular/README share/sgml/docbook/dsssl/modular/VERSION share/sgml/docbook/dsssl/modular/BUGS share/sgml/docbook/dsssl/modular/TODO > According to Valgrind on amd64-linux, this patch causes bzip2 to
> do comparisons based on uninitialised memory when compressing, for the
> attached file (PLIST).
>
> ==16140== Conditional jump or move depends on uninitialised value(s)
> ==16140== at 0x414ECF: mainGtU (blocksort.c:538)
> ==16140== by 0x414B08: mainSimpleSort (blocksort.c:690)
> ==16140== by 0x4150E1: mainQSort3 (blocksort.c:805)
Julian! You informed me of this issue in a direct e-mail from Dec 19.
I responded the next day on Dec 20:
------------------------------------------------------------
= From my 5-minute investigation, I *think* this is because the loads
= /* Load the bytes: */
= n1 = (__m128i)_mm_loadu_pd((double *)(block + i1));
= n2 = (__m128i)_mm_loadu_pd((double *)(block + i2));
I doubt it -- Valgrind's diagnostics says "Conditional jump or move
depends on uninitialised value(s)". There are no jumps in the above
code, there are two machine instructions (executed in parallel).
------------------------------------------------------------
Have you not received the e-mail with above text? (Message-Id:
<200612200615.22819@Misha>).
Yours,
-mi
I believe this analysis is correct: > /* Load the bytes: */ > n1 = (__m128i)_mm_loadu_pd((double *)(block + i1)); > n2 = (__m128i)_mm_loadu_pd((double *)(block + i2)); > > read beyond the end of the defined area of block. block is > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2], > hence loading 15 bytes of garbage. Valgrind doesn't complain about the out-of-range access, because you are still accessing inside a valid malloc-allocated block. But it does know that the read data is uninitialised, hence it complains when you do a comparison with that data followed by a conditional branch (or move) based on the result of the comparison. > This is possible... You think, the loop should exit earlier and test > the last (up to) 15 bytes one-by-one? Certainly the loop-end stuff needs to be fixed up somehow to reflect the 16 byte loads, but without further investigation I'm not sure how. On Sunday 07 January 2007 00:08, Julian Seward wrote: = > /* Load the bytes: */ = > n1 = (__m128i)_mm_loadu_pd((double *)(block + i1)); = > n2 = (__m128i)_mm_loadu_pd((double *)(block + i2)); = > read beyond the end of the defined area of block. block is = > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think = > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2], = > hence loading 15 bytes of garbage. I don't think, that's quite right... Instead of processing 8 bytes at a time, as the non-SSE code is doing, I'm comparing 16 at a time. Thus it is possible for me to be over by exactly 8 sometimes... Anyway, the problem was stemming from my bumping i1 and i2 by 16 instead of 8 after the _initial check_ (which, in the quadrant-less case should not need to be separate at all, actually). Sometimes _that_ would bring them over... I think, the solution is to either bump up BZ_N_OVERSHOOT even further or check and adjust i1 and i2: if (i1 >= nblock) i1 -= nblock; if (i2 >= nblock) i2 -= nblock; at the beginning, rather than the end of the loop. Having done that, I no longer peek beyond the end of the block (according to gdb's conditional breakpoints, at least). Please, check the new http://aldan.algebra.com/~mi/bz/blocksort-SSE2-patch-2 Yours, -mi P.S. The following gdb-script is what I used. Run as: gdb -x x.txt bzip2 x.txt: break blocksort.c:516 cond 1 (i1 > nblock) || (i2 > nblock) run -9 < /tmp/PLIST > /dev/null andjust the compression level, the input's location, and be sure to have blocksort.o compiled with debug information, of course... For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi> |