mp3splt creates ID3v2 tags in UTF-16LE while it should be UTF-16BE according to Wikipedia. ID3Fixer says the tags are incorrect. Workaround: use "-C 8" or even "-C 1". Sorry, I think this PR is one of the weirdest ever. I'm not even sure I should be sending it. The original authors of mp3splt don't seem to be reachable by E-mail.
Thanks for the report, I will look into it. As far as I remember mp3splt writes id3v2.4.0 tags and 2.4.0 allows text in both UTF-16 with BOM and UTF-16BE: http://id3.org/id3v2.4.0-structure Does it causes compatibility problems?
(In reply to Anton Yuzhaninov from comment #1) I was basing my report on information from Wikipedia: Textual frames are marked with an encoding byte.[8] $00 – ISO-8859-1 (LATIN-1, Identical to ASCII for values smaller than 0x80). $01 – UCS-2 encoded Unicode with BOM, in ID3v2.2 and ID3v2.3. $02 – UTF-16BE encoded Unicode without BOM, in ID3v2.4. $03 – UTF-8 encoded Unicode, in ID3v2.4. Maybe the Wikipedia information is incorrect. And the MP3 player in my car did show "---------------" instead of English song names until I switched to -C1 This player has problems with Russian tags, I admit, but there should be no problems with plain English song names at least.
I've looked at my mp3 file written by mp3splt in hd(1) - unfortunately I haven't found a more usable tool which can show all relevant information. In a hex dump I can see: 1. mp3splt writes ID3 V2.4.0 tags 2. text frames contain encoding description byte $01 (UTF-16 with BOM) followed by BOM $feff (big endian) and then followed by text encoded in UTF-16 BE. So as I can see this ID3 tag confirms ID3 specification. Only issue I can see - mp3splt always writes TIT2, TPE1, TALB, TCON frames in UTF-16 encoding, even when the text can be represented as ISO-8859-1 or even ASCII. Probably your hardware player doesn't support UTF-16/UCS-2 and expects all strings to be in ISO-8859-1. For better compatibility with incomplete player implementations we can check if text can be encoded as ISO-8859-1 and use $00 text description byte (ISO-8859-1) with text in ISO-8859-1. But I'm not sure where it is better to implement this auto-detection: in mp3splt or in libid3tag (library used by mp3splt for writing tags). Unfortunately libid3tag is also not actively developed - last release was in 2004. I think -C 8 (utf-8) works for this player only by accident - ASCII text in UTF-8 and ISO-8859-1 are the same sequence of bytes. Changing default option to -C 8 is a bad option - there are implementations (players) which support UTF-16 (present V2.2.0), but not UTF-8 (was added only in V2.4.0).
(In reply to Anton Yuzhaninov from comment #3) > I think -C 8 (utf-8) works for this player only by accident - ASCII text in UTF-8 and ISO-8859-1 are the same sequence of bytes. No, this is not the case. I have found out that the player shows *Cyrillic* song names correctly provided they are ID3v2 tags in UTF-8 encoding, so this: mp3splt -a -T 2 -C 8 -o @b/@t -c concert.cue concert.mp3 works fine if "concert.cue" is in Russian and in UTF-8.
https://victor-sudakov.dreamwidth.org/434082.html
Have you tried what ID3Fixer https://play.google.com/store/apps/details?id=com.yschi.ID3Fixer thinks of tags created by mp3splt?
I don't see any error messages from ID3Fixer, though it corrupts some (why not all?) files when output charset is UTF-16. It looks like ID3Fixer bug. As I can see from hex dump mp3splt writes correct ID3 tags. Using utf-8 may be a good workaround for given player, but it is not sensible default: utf-8 was introduced only in id3v2.4.0 and there are many players, which support only id3v2.3.0 (If I not mistaken even Windows 7 supports only id3v2.3.0).
(In reply to Anton Yuzhaninov from comment #7) Wikipedia writes that the default charset before the introduction of ID3v2.4 should have been UCS-2, not UTF-16. Can it be the reason for the partial corruption? UTF-16 and UCS-2 are similar but not quite the same, aren't they?
(In reply to vas from comment #8) text encoding description byte 0x01 mean UCS-2 (with BOM) in id3v2.3.0 and UTF-16 with BOM in id3v2.4.0 For Basic Multilingual Plane (first 65536 code points) UTF-16 and UCS-2 have the same byte representation. Symbols beyond BMP can't be represented in UCS-2 and hence in id3v2.3.0, but all popular languages are covered by BMP. If some implementation supports only id3v2.3.0/UCS-2 then symbols outside BMP (e.g. Unicode emoji) from id3v2.4.0 tags will be corrupted, but other symbols should be decoded correctly. But I have no files with symbols outside BMP to test it.
(In reply to Anton Yuzhaninov from comment #9) Anton, This has been a very useful and educational conversation for me. I think we can close this PR, but this was not in vain.
Agreed by the reporter and the maintainer.