Upgrade your PostgreSQL databases to UNICODE
Recently, I had the fine oppertunity to upgrade all my CMS powered PostgreSQL databases and perl powered websites from SQL_ASCII to Unicode.
If you don’t know what unicode is, have a look at this article: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
It was less painfull then I thought, in fact there are 6 steps
- Dump your database
- Remove invalid characters from the dump
- Make a diff from the dump
- Remove current database
- Create a new databse
- Load the data into the database you’ve created in step 2
Below is a simple bash script to manage this stuff. Edit it to your needs.
#!/bin/bash
db=$1
# This script updates all dbs to use unicode
dbhost='localhost'
username='larik'
odir=${HOME}/CMS_UPGRADE
if [ "${db}X" == "X" ]
then
echo "I need a db for host ${dbhost} and username ${username} $db"
exit
fi
if [ ! -d $odir ]
then
mkdir $odir || exit "Exit at mkdir"
fi
dump_sql=${odir}/${db}_out.sql
conv_sql=${odir}/${db}_conv.sql
result_sql=${odir}/${db}_result.txt
sql_diff=${odir}/${db}.diff
extra_string="database: ${db}, host: ${dbhost}, user: ${username}"
# Wat dient er te gebeuren:
# 1. Dump database
pg_dump --host=$dbhost --username=$username -D --file=$dump_sql --format=p $db || exit "exit at pg_dump ${extra_string}"
# 2. Remove invalid characters from the dump
/sw/bin/iconv -c -f UTF-8 -t UTF-8 ${dump_sql} > ${conv_sql} || exit "exit at iconv ${extra_string}"
# 3 Make a diff from the dump
diff $dump_sql $conv_sql > $sql_diff
# 4. Remove current database
dropdb --host=$dbhost --username=$username $db || exit "exit at dropdb ${extra_string}"
# 5. Create a new databse
createdb --encoding='unicode' --host=$dbhost --username=$username $db || exit "exit at createdb ${extra_string}"
# 6. Load the data into the database you've created in step 2
psql -f $conv_sql -o $result_sql -h $dbhost -U $username $db || exit "exit at psql ${extra_string}"Notes about this script:
- Make sure you have iconv installed. It is standard on Linux and Mac OS X
- Make sure that the pg_dump you run is the same version as the database you’re dumping
- Check your diff files after the script has run, invalid UTF-8 characters will be removed from and won’t be available in the new sql script
Have fun!
Trackbacks
Use the following link to trackback from your own site:
http://blog.larik.nl/trackbacks?article_id=12