Hi there,
As I mentioned, my Windows 7 system is running under codepage 932 (Japanese) which is a double byte encoding for certain character ranges. If you check
http://en.wikipedia.org/wiki/Shift_JIS you'll see that characters starting at 0x81 are the first byte of a double-byte character.
Now my problem is that HeidiSQL somehow converts pure binary data stored inside a BLOB field while processing them for any output (either on screen, per SELECT query, or with the "Export database as SQL" feature.
You can see the output of Export database as SQL below. Showing the same problematic result:
;
;
;
;
CREATE TABLE IF NOT EXISTS `blobtesttable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`data` blob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=ascii COLLATE=ascii_bin;
;
INSERT INTO `blobtesttable` (`id`, `data`) VALUES
(1, _binary 0x80),
(2, _binary 0x8145),
(3, _binary 0x8145),
(4, _binary 0x8145),
(5, _binary 0x8145),
(6, _binary 0x814500),
(7, _binary 0x8145),
(8, _binary 0x8145),
(9, _binary 0x8145);
;
;
;
;
For instance, inserting a single byte of binary data "0x81" into the table, HeidiSQL thinks that it has two bytes with "0x8145" inside that field.
Even though when using "SELECT hex(data) ..." (pasted into the previous post) clearly shows the original true 0x81 that was inserted and is stored in the table. Obviously using hex(data) in the query makes the server do the binary-to-string conversion and not HeidiSQL, so there is no chance of the HeidiSQL client doing any further transformation on it.
What breaks here is that exported BLOB data (be it copying from one database to another, or as a backup into a file) does not copy the binary data as it is stored in the database, it stores a corrupted form that has more bytes added to the original binary data. Using "mysqldump --hex-blob" on the server machine directly does no transformation/corruption.
I did some more checks and it seems that everything marked as "first byte of a double-byte JIS X 0208 character" on the Shift JIS byte map on Wikipedia gets converted to 0x8145 in HeidiSQL running here.
I am quite sure that its not something actively done by HeidiSQL, I assume its some kind of component that relies on the system codepage to do data processing that is wrongfully used for pure binary data.
So it would be a bug that does not show on European or American Windows installations. Windows in 2014 is still a huge mess when it comes to Unicode support. It's hilarious and sad...
Bernhard