Skip to content

Archive for August, 2009

comment

beavers

2009 August 31
 

2009-08-31 beavers - 2

a hot summer hike along the bow river valley, for the first Silver Springs Beavers meeting of the season

converting mediawiki mysql database from latin1 to utf8

2009 August 31
 
by dnorman

Sometime after upgrading our copy of MediaWiki from the antique version I’d had to run, to the shiny latest version, I noticed (well, some of the wiki users noticed first…) that there were some borked characters – accented French characters, Mandarin characters, and fancy schmancy “smart quotes” were displaying as gobbledygook gibberish text. Smelled like a UTF8-related issue – IIRC, MediaWiki switched the front end of the web app to be UTF8, but my database was languishing behind in krufty latin1 encoding. oops.

So, how to convert a database that’s approaching a gigabyte of data? I started googling. Found some hints. But none of the solutions I found actually worked for me. So I googled some more, and duct-tape-and-bubblegummed a script together that seems to have successfully converted the database from latin1 to UTF8.

Here’s what I did, in case I need to do it again…

#!/bin/bash

general config stuff

mysql_path="/usr/local/mysql/bin"

source database config

change this to point to the database server that is currently hosting the database.

source_host="localhost" source_db="source_database_name" # mine was mediawiki source_user="source_database_username" source_pw="password" #change this. duh. temp_sql_dir="/path/to/a/directory"

destination database config

change this to point to the database server that will host the converted database

this could be the same server as above, but use a different database on it.

dest_host="localhost" dest_db="dest_database_name" # mine was mediawiki-utf8 dest_user="dest_database_username" dest_pw="password"

magic happens

clear for table in $mysql_path/mysql --host=$source_host --user=$source_user --password=$source_pw $source_db -e 'show tables' | egrep -v 'Tables_in_'; do echo "Dumping $table"

# dump the table from the first database
$mysql_path/mysqldump --host=$source_host --user=$source_user --password=$source_pw --extended-insert=false --quote-names --default-character-set=latin1 $source_db $table > $temp_sql_dir/$table.sql

# convert the charset declarations from latin1 to utf8
# sed seems to bork mysteriously, and the mysql 'replace' command borks the files, so I settled on perl...
# you could add other transformations here, too...
perl -pi -w -e 's/latin1/utf8/g;' $temp_sql_dir/$table.sql

# import the converted table data into a fresh database table
$mysql_path/mysql --host=$dest_host --user=$dest_user --password=$dest_pw --default-character-set=utf8 $dest_db < $temp_sql_dir/$table.sql

done

or not.

if [ "$table" = "" ]; then echo "No tables found in db: $db" fi

It seems to have worked for me. It took maybe 5 minutes to convert almost a gig of data. It borked on one table – categorylinks failed because of some problem with the key, so I just manually copied it over myself after the script was finished.

It is highly probable that there is a better way to do this. Perhaps even some magic bit that could have been twiddled to do this automagically and/or instantly on the server. I couldn’t find the proper lever to throw, so wound up trying brute force. When in doubt, try brute force. It worked for me. It might not work for you. You’ve been warned.

shoes

2009 August 30
 

2009-08-30 shoes

horseshoes of various sizes, lined up along the side of the workbench in the blacksmith’s shop in heritage park.

rider

2009 August 29
 
by dnorman

2009-08-29 rider

Evan, riding like a pro near the half-way mark of a 10km ride around the community.

not my blog anymore

2009 August 28
 
by dnorman

my-god-its-full-of-comments-fullA couple of weeks ago, the number of words in comments (i.e., stuff you wrote) passed the number of words in posts (i.e., stuff I wrote) on this blog. And now, the comment word count just pushed over half a million words.

This is no longer my blog. I’m not sure what it is, but it ain’t (just) mine, for sure.

ICT Coffee Haus

2009 August 28
 
by dnorman

2009-08-28 - ICT Coffee Haus

I had a coffee-meeting with one of the amazing profs we get to work with.

blue skies ahead

2009 August 27
 
by dnorman

2009-08-27 - blue skies ahead

taken from the bench we usually sit at for morning caffeine, next to the shiny ICT building

in the chair

2009 August 26
 
by dnorman

2009-08-26 in the chair

spent some time this afternoon visiting with my local neighbourhood dental hygienist.

yarisdrops

2009 August 25
 
by dnorman

yaris-drops

I keep checking to see if it’s turning into ice yet. It’s really close…

constructing

2009 August 24
 

constructing

The exterior of the new digital library building is nearly complete.