UTF8 BOM

46 posts

lotiara posted 4 days ago in General

HI ansgar I Have to run a bunch of sql iles that are stored in UTF8 BOM , is it possible to add UTF8 BOM support in the ecncoding List ? I've read the forum and this topic has already been addressed, the solution is to open th BOM files with Notepad++ and change to UTF8, this works, but in my case there are a lot of files and I have to do this a lot of times. May be with the new Delphi version and synedit it would be possible to add this feature easily ?

Thank you.

10383 posts

ansgar posted 4 days ago

I cannot recall exactly why I did not add an "UTF-8 BOM" encoding in the file-open dialog. Perhaps because such files were (and still are) quite unpopular, at least from my own impression, and at least for .sql files.

I see there is no feature request in the tracker asking to support a BOM encoding.

Does it really cause problems if you select "UTF-8" encoding instead?

46 posts

lotiara posted 4 days ago

I am getting more and more that Kind of charset (dont know why). Yes, Heidisql fails to run that kind of files (selecting UTF8), because of the charset, its not even visible in the query editor, but if I open with Notepad++ and save as UTF8, then no error. Run sql Files is very usefull, I can select more than 30 sql files and it runs nicely, the problem is the BOM.

Regards

95 posts

TTSneko posted 4 days ago

@lotiara: it is no "type of charset"; your SQL snippets are UTF8 text files containing a Byte Order Mark (BOM). It is YOUR responsiblity to ensure the correct encoding of used imports/snippets/data, and BOM is NOT one the things you want. Make sure to strip such garbage from your snippets/data if you do not work with UTF8 data streams.

As some users may be working with (and thus storing) such stream instructions in data fields (for apps that utilize BOM), Anse can not generally strip BOM markers from all types of imports.

BOM is not required nor recommended with use in UTF8 anyway as it serves no purpose except to mark the start of a UTF8 stream. However this isn't an effective way to distinguish UTF8, as the BOM may simply be parsed as characters in other encoding formats (== potentially dangerous entity, like an additional, un-escaped comma in a CSV file!). A BOM should only be used in the context that data is transmitted in a multi-byte format (UTF-16/32) with no specification of endianness.

46 posts

lotiara posted 4 days ago

" is YOUR responsiblity to ensure the correct encoding of used imports/snippets/data, and BOM is NOT one the things you want. Make sure to strip such garbage from your snippets/data if you do not work with UTF8 data streams." Keep Cool TTSneko, i am asking for something simple, if possible. I am not generating those garbage sql files, they are given to me. If its possible and easy, nice. If not, bad luck. But stay calm.

Bye.

95 posts

TTSneko posted 4 days ago

Sorry, squeezing German nature in English writing often ends up as sounding arrogant or aggressive. I merely wanted to stress that BOM is not only useless but can cause massive problems.

I got it that you received the files "as is". The corresponding source does not seem to give a damn on what may happen to your data, go figure.

As you mention a large number of snippets one could of course import all of them into a text editor and bulk-save them without BOM, however I do not know if Notepad++ you used can do it (I use KATE on Linux). Or perhaps check a Windows PowerShell solution : https://ss64.org/viewtopic.php?t=365 ... just edit the file extension accordingly :)

46 posts

lotiara posted 4 days ago

No problem . I already have a command lne solution, but more than once I froget to run it, and that's a lot of time lost.

Anyway BOM is really a soure of problems, I dont even know why they use it.

Bye.

10383 posts

ansgar posted 3 days ago

@lotiara could you please create a feature ticket on Github, so I can plan that for the next release (probably not for 12.9 but the next one after that). Thank you!

46 posts

lotiara posted 3 days ago

Hi Ansgar, of coure.