Suggestion: Internationalisation

[expired user #2749]'s profile image [expired user #2749] posted 17 years ago in Feature discussion Permalink
First I should say that I love the HeidiSQL GUI, because it delivers great productivity. There is one small catch: HeidiSQL doesn't seamlessly support international character sets yet. Have you ever tried to design a database for Chinese, Russian or Thai applications? Unfortunately, you can't get it to work with HeidiSQL.

It's a bummer since international character encoding is fully supported by the MySQL server and should be very easy to implement.

So, here is my suggestion. Not that I want to complain or pressure you guys, but this is really a snap to implement compared to the wonderful features you have already implemented.

1. Read the character encoding at the host level (default should be Unicode UTF-8) and set the connection accordingly (using the "SET NAMES..." syntax).
2. Read the character encoding at the database level and set the connection accordingly when connecting to a database or switching from one DB to another (again using the "SET NAMES..." syntax).
3. Allow users to set character encoding at the database level with an additional drop-down box when CREATE DATABASE is issued.
4. Allow users to set character encoding at the table level with an additional drop-down box when CREATE TABLE is issued.
5. Column encoding follows the TABLE or DATABASE default. Specific encoding at COLUMN level is rarely used, so if someone needs it, then the CREATE or ALTER TABLE statement could be entered manually.

This should get HeidiSQL ready to conquer Asia and the rest of the world... happy

Cheers, SQuareLogic
ansgar's profile image ansgar posted 17 years ago Permalink
Thanks a lot for your constructive proposals.

I think the additional pulldown menus in the dialogs for "Create database" and "Create table" are easy to implement. We can fetch a complete list of supported charsets at startup with this statement and put them into the pulldowns:
SHOW CHARSET

(Again, we will have to find out since which MySQL-version this statement is supported, btw)

Also setting the appropriate character set would then be not the biggest task. We can fetch the charset from the database or table and fire
SET NAMES 'big5'

for example, before you get into it.

Ok, then we would have some more consistency at the management level when it comes to create tables and databases.

What will be completely ignoring all these changes are the datagrids in both "Data" and "Query" tabs. The grid is a component which does not support multibyte character sets. I tried different ideas in childwin.pas:
griddata.Font.Charset := CHINESEBIG5_CHARSET;
FConn.MysqlConn.Properties.add('Codepage=big5');
ExecuteNonQuery( 'SET NAMES big5' );

With absolutely no effort.

However, we could switch to a unicode enabled grid at sometime in the future, but that will take some time.
ansgar's profile image ansgar posted 17 years ago Permalink
Besides from the problems with the datagrids: Would it help you if we would first add the above mentioned pulldown menus to the "Create db/table" dialogs?
[expired user #2749]'s profile image [expired user #2749] posted 17 years ago Permalink

The grid is a component which does not support multibyte character sets.



Ouch! :(

Besides from the problems with the datagrids: Would it help you if we would first add the above mentioned pulldown menus to the "Create db/table" dialogs?



Yes, certainly. I think that's a step in the right direction as it would already enable users to create Unicode-enabled DDL. It's certainly useful for people who use HeidiSQL primarily as a DB development tool.

Cheers, SQareLogic
[expired user #2801]'s profile image [expired user #2801] posted 17 years ago Permalink

Thanks a lot for your constructive proposals.

I think the additional pulldown menus in the dialogs for "Create database" and "Create table" are easy to implement. We can fetch a complete list of supported charsets at startup with this statement and put them into the pulldowns:

SHOW CHARSET

(Again, we will have to find out since which MySQL-version this statement is supported, btw)



As I recall full unicode support was added in version 4.1 along with the accompanying features like this, but I'm not 100% sure on that. At least I can confirm that SHOW CHARSET isn't a valid query on 4.0.20.

As an alternative you might be able to use SHOW VARIABLES and fetch character_set and character_sets from the resultset. These seem to indicate current charset and a list of supported ones, respectively.
ansgar's profile image ansgar posted 17 years ago Permalink
Thanks a lot, Remigus. The docs do not reveal the exact version since which charactersets are supported but I found out that the result of a SHOW CREATE DATABASE contains a version conditional, so I'm mostly sure it's 4.1
CREATE DATABASE `steuerlex` /*!40100 DEFAULT CHARACTER SET latin1 */


However, charset + collation can now be specified on the CREATE DATABASE dialog. As a benefit, on 5.0.2 servers you can afterwards change those settings for a database. On 5.1.7 servers you can even rename the database.

See also:
http://rfe.heidisql.com/1693393 (The official feature-request)
http://fisheye.heidisql.com/933 (ALTER DATABASE implemented)
http://fisheye.heidisql.com/928 (Charset + Collation implemented)
ansgar's profile image ansgar posted 17 years ago Permalink
One thing I'm not 100% sure about is: Should CREATE DATABASE follow the charset settings of the server or the setting of the client? To be more precise, should the default be the result of:
SHOW VARIABLES LIKE 'character_set_server'

or
SHOW VARIABLES LIKE 'character_set_client'

?

First I had it implemented using 'character_set_database' which ended up to be totally wrong as that represents the charset of the currently used database.

But what should be more relevant - the client's or the server's charset?
[expired user #2814]'s profile image [expired user #2814] posted 17 years ago Permalink
Hello all.
Hope you'll find this helpful.
MySQL started to support database charset with 4.1, including CREATE DATABASE - I've just tried it with 4.1.13a and PhpMyAdmin 2.6.3 pl1. This query worked without errors: "CREATE DATABASE `test123` DEFAULT CHARACTER SET utf8 COLLATE utf8_slovenian_ci;"; and the charset has indeed been applied to the new database. To be precise, PhpMyAdmin lets you choose the collation when creating - the character set is implied in the collation you take.
As far as I know the variables you get with SHOW VARIABLES LIKE CHARSET* (this includes character_set_connection/database/results and system, too) and LIKE COLLATION* (_connection/database/server) are set in my.ini and read at server start. Changing them at runtime is doable, but it sounds to me like looking for trouble (it might affect - without notice -other users logged on that server), as it's much simpler to use DDL.
Character_set_database and collation_database are what any new database will use if not instructed otherwise. A fresh out-of-the-box install defaults to respectively utf8 and utf8_swedish_ci.
Thanks a lot for your efforts.
ansgar's profile image ansgar posted 17 years ago Permalink
Thanks a lot.

As far as I know the variables you get with SHOW VARIABLES LIKE CHARSET* (this includes character_set_connection/database/results and system, too) and LIKE COLLATION* (_connection/database/server) are set in my.ini and read at server start. Changing them at runtime is doable, but it sounds to me like looking for trouble (it might affect - without notice -other users logged on that server), as it's much simpler to use DDL.



Heidi would not change any system variable here. The question was just which characterset should be the default when creating new databases.

Character_set_database and collation_database are what any new database will use if not instructed otherwise. A fresh out-of-the-box install defaults to respectively utf8 and utf8_swedish_ci.
Thanks a lot for your efforts.



The system variable "character_set_database" is definitely changing when you switch to a database which has another characterset than the one in your my.ini. That means, it cannot be the default when creating a database. Instead we have to decide whether to use character_set_server or character_set_client.
[expired user #2814]'s profile image [expired user #2814] posted 17 years ago Permalink
You're right about 'character_set_database', my bad.

My favorites are 'character_set_server' and 'collation_server', even if I feel one could live with whichever default as long as it's possible to change it at CREATE time.
I don't have a particularly solid argument for this, it's just that people that doesn't know or care about character set issues are unlikely to mess with it in my.ini, leaving it to the 'factory' defaults. Sounds like a safer option to me...
[expired user #3251]'s profile image [expired user #3251] posted 17 years ago Permalink
I think a problem that can still be fixed regarding internationalization is the Load SQL function. I have UTF-8 SQL files but when HeidiSQL loads them the non-latin characters become scrambled, making the import a failure despite the database and server collation settings.
[expired user #2814]'s profile image [expired user #2814] posted 17 years ago Permalink
Not sure if this would help, but you can try "SET NAMES utf8" before importing.
[expired user #3251]'s profile image [expired user #3251] posted 17 years ago Permalink
no good unfortunately.
[expired user #1919]'s profile image [expired user #1919] posted 16 years ago Permalink
Is this issue been fixed or is it still an issue?

I get some odd chars when using Heidi, I thought it was me adding the bad chars in.
Unless there is a setting to change in Heidi, I can't use it for Imports / Exports.

Cheers
ansgar's profile image ansgar posted 16 years ago Permalink
Most unicode related issues have been fixed, while i'm sure there are some minor ones left but:
- export to SQL files always writes UTF-8 files
- loading SQL files detects the file BOM and correctly interpretes UTF-8 and UTF-16 files
- export to CSV/HTML/XML files also writes UTF-8
- import from CSV files allows selecting a character set
- grid editing allows unicode chars.

For all cases it's important for the user to ensure the relevant database, table and columns have the correct charset. For example, you cannot expect a latin1 table to store international characters correctly, even if Heidi does its best to allow that in its GUI controls.
[expired user #1919]'s profile image [expired user #1919] posted 16 years ago Permalink
Thanks

Please login to leave a reply, or register at first.