Sometimes UTF-8 is the wrong choice..

Some technical background for this post: as a matter of fact, a computer is unable to work with text on a fundamental level - a computer only understands numbers. Hence, what the computer does is to create a table associating every letter with a number. Such a table is called a character set. For example, in the common ASCII table the letter "A" is assigned the number 65.

Lots of character sets exist for different systems or languages - and it is fairly obvious that you need a much longer list for, say, Chinese or Japanese than for English characters. But today most of the character sets are pretty much obsolete and I automatically choose one called UTF-8 for any new project I work on. This is the only widely supported character set which can show characters from ALL languages in the same page.

But: sometimes UTF-8 is the wrong choice. Very wrong. For Wergelandkalenderen I did choose UTF-8 for the backend database, both for storing quotes and subscribers. And I've been severely punished by problems..

Because we're not yet at a stage where all software supports UTF-8. If you send an E-mail in Norwegian with UTF-8 encoding you risk that the recipients will be unable to see the three non-English characters in the Norwegian alphabet, , and , correctly.

After considering this (hey there, Wergeland for obvious reason never used a single Kanji or Hangul in his works so UTF-8 is NOT really required here) it became obvious that I should have used the standard Western character set. Too late though. Now the script that sends out E-mails does the conversion from UTF-8 to iso-8859-1. I hope that all errors in this script are now finally corrected, but I've sent E-mails with a mangled greeting to anyone with , or in their names, and E-mail that tries to contain fancy curly quotes or dashes copied from Word turn the special characters into question marks since you can't convert these characters to iso-8859-1. Aargh. But it's a good experience to remember: UTF-8 is THE BEST encoding but sometimes it is the wrong choice anyway.

<<: Wergeland poetry in your inbox? --


Add commentAdd comment