Upgrade your PostgreSQL databases to UNICODE

Posted by Frodo Larik Mon, 13 Mar 2006 20:03:00 GMT

Recently, I had the fine oppertunity to upgrade all my CMS powered PostgreSQL databases and perl powered websites from SQL_ASCII to Unicode.

If you don’t know what unicode is, have a look at this article: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

It was less painfull then I thought, in fact there are 6 steps

  1. Dump your database
  2. Remove invalid characters from the dump
  3. Make a diff from the dump
  4. Remove current database
  5. Create a new databse
  6. Load the data into the database you’ve created in step 2

Below is a simple bash script to manage this stuff. Edit it to your needs.

#!/bin/bash

db=$1

# This script updates all dbs to use unicode

dbhost='localhost'
username='larik'
odir=${HOME}/CMS_UPGRADE

if [ "${db}X" == "X" ]
then
   echo "I need a db for host ${dbhost} and username ${username} $db"
   exit
fi

if [ ! -d $odir ]
then
   mkdir $odir || exit "Exit at mkdir"
fi

dump_sql=${odir}/${db}_out.sql
conv_sql=${odir}/${db}_conv.sql
result_sql=${odir}/${db}_result.txt
sql_diff=${odir}/${db}.diff

extra_string="database: ${db}, host: ${dbhost}, user: ${username}"

# Wat dient er te gebeuren:

# 1. Dump database

pg_dump --host=$dbhost --username=$username -D --file=$dump_sql --format=p $db || exit "exit at pg_dump ${extra_string}"

# 2. Remove invalid characters from the dump

/sw/bin/iconv -c -f UTF-8 -t UTF-8 ${dump_sql} > ${conv_sql}  || exit "exit at iconv ${extra_string}"

# 3 Make a diff from the dump

diff $dump_sql $conv_sql > $sql_diff

# 4. Remove current database

dropdb --host=$dbhost --username=$username $db || exit "exit at dropdb ${extra_string}"

# 5. Create a new databse

createdb --encoding='unicode' --host=$dbhost --username=$username $db || exit "exit at createdb ${extra_string}"

# 6. Load the data into the database you've created in step 2
psql -f $conv_sql -o $result_sql -h $dbhost -U $username $db || exit "exit at psql ${extra_string}"

Notes about this script:

  • Make sure you have iconv installed. It is standard on Linux and Mac OS X
  • Make sure that the pg_dump you run is the same version as the database you’re dumping
  • Check your diff files after the script has run, invalid UTF-8 characters will be removed from and won’t be available in the new sql script

Have fun!

Trackbacks

Use the following link to trackback from your own site:
http://blog.larik.nl/trackbacks?article_id=12