Created attachment 157059 [details] Patch by tijl@ (This is mainly for tracking purposes). I have observed this issue with dovecot which the index worker would crash for: Panic: file charset-iconv.c: line 132 (charset_to_utf8): assertion failed: (*src_size - pos <= CHARSET_MAX_PENDING_BUF_SIZE) Having been annoyed by this for some time I have decided to instrument the code to figure out what have happen under the hood. Eventually, I have discovered that if iconv(3) is asked to convert two UTF-7 strings to UTF-8: "+ADw-SPAN+AD4-" And "+ADw-SPAN lang" The second conversion would give wrong results, while GNU implementation of iconv(3) does not have the same issue. Using gdb, the UTF-7 mode was 1 (shift) when processing the second string, while it should be 0, so using an iconv(cd, NULL, NULL, NULL) would mitigate this issue. I have then asked Tijl Coosemans <tijl@> who have quickly found the problem and created a patch, quote: === _citrus_UTF7_mbtoutf16 stored the decoder state at the beginning so it could restore this state on an incomplete character such that the next call would restart the decoding. The problem was that "-" at the end of a string was also treated as an incomplete character but was also removed from the state buffer. So the initial state would be restored (with base64 mode) but the next call would no longer see the "-" and thus continued in base64 mode. This state saving/restoring isn't needed here. It's already handled elsewhere (citrus_iconv_std.c:_citrus_iconv_std_iconv_convert) so the patch removes it. The patch also improves the decoding of 4 byte UTF16 characters. If only 2 bytes can be read it is treated as an incomplete character now (returning an error) whereas before it would be treated as a shift sequence (not an error). A range check has been added for the low 2 bytes as well. ===
Assign to our hero.
A commit references this bug: Author: tijl Date: Sun May 24 15:27:32 UTC 2015 New revision: 283406 URL: https://svnweb.freebsd.org/changeset/base/283406 Log: Fix decoding of UTF-7 when a base64 encoded chunk appears at the end of the input buffer. _citrus_UTF7_mbtoutf16 stored the decoder state at the beginning so it could restore this state on an incomplete character such that the next call would restart the decoding. The problem was that "-" (end of base64 mode) at the end of a string was also treated as an incomplete character but was also removed from the state buffer. So the initial state would be restored (with base64 mode) and the next call would no longer see the "-" so it continued in base64 mode. This state saving/restoring isn't needed here. It's already handled elsewhere (citrus_iconv_std.c:_citrus_iconv_std_iconv_convert) so just remove it. Also initialise *nresult. PR: 200398 Tested by: delphij MFC after: 1 week Changes: head/lib/libiconv_modules/UTF7/citrus_utf7.c
A commit references this bug: Author: tijl Date: Tue Jun 2 09:42:00 UTC 2015 New revision: 283908 URL: https://svnweb.freebsd.org/changeset/base/283908 Log: MFC r283406,283418: Fix decoding of UTF-7 when a base64 encoded chunk appears at the end of the input buffer. _citrus_UTF7_mbtoutf16 stored the decoder state at the beginning so it could restore this state on an incomplete character such that the next call would restart the decoding. The problem was that "-" (end of base64 mode) at the end of a string was also treated as an incomplete character but was also removed from the state buffer. So the initial state would be restored (with base64 mode) and the next call would no longer see the "-" so it continued in base64 mode. This state saving/restoring isn't needed here. It's already handled elsewhere (citrus_iconv_std.c:_citrus_iconv_std_iconv_convert) so just remove it. Also initialise *nresult. When only 2 bytes can be read from a 4 byte UTF-16 character in a base64 encoded chunk of a UTF-7 string, treat that as an incomplete character and return an error instead of a shift sequence and no error. Also check that the low 2 bytes have a valid value. PR: 200398 Changes: _U stable/10/ stable/10/lib/libiconv_modules/UTF7/citrus_utf7.c