Search Postgresql Archives

Re: converting databases form SQL_ASCII to UTF8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/22/2011 08:00 AM, Geoffrey Myers wrote:
We are moving our databases to new hardware soon, so we felt it would be a good time to get the encoding correct. Our databases are currently SQL_ASCII and we plan to move them to UTF8.
We are in the same boat, fortunately only on one older server we are upgrading and fortunately for internal apps....


So, as previously noted, there are certain characters that won't load into a UTF8 database from a dump of the SQL_ASCII database.

Here's our problem. We planned on moving databases a few at a time. Problem is, there is a process that pushes data from one database to another. If this process attempts to push data from a SQL_ASCII database to a new UTF8 database and it has one of these characters mentioned above, the process fails.

So, now the question is, is this effort even worth our effort?

What is the harm in leaving our databases SQL_ASCII encoded?

SQL_ASCII is a synonym for "no encoding." You put in a stream of bytes and that's what you get out. That's OK if the byte-stream has exactly the same meaning to every application and user. If that's not the case then you have bytes in your database but you don't know what those bytes are supposed to represent.

In a way, it's like having a generic integer column but depending on the user or the application, that column might represent a unix epoch timestamp, an age in years, a salary, a weight in grams, furlongs per fortnight, etc. And there is no indicator to say which it is. Not good.

We are in the final stages of cleaning up our last bit of non-utf8 data and the above some what silly example is actually not far from the truth. Due to data that arrived from web-inputs, spreadsheet imports, command-line, internal-apps, etc. we have been faced with cleaning tables where one row has only basic ASCII data, another has UTF8, while others have various different Microsoft encodings. With data like that it's pretty much impossible to guarantee that even a simple web-report will display fully correctly.

The longer you wait, the worse it gets. Even though it is only a tiny percentage of our data, cleaning it is still a pain.

Is it worth it? Dunno - you have to calculate the cost/benefit. For us it was a no-brainer to bite-the-bullet and do it.

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux