So, you’ve started programming your own web application, but you found an unexpected problem: your accented letters / umlauts are displaying STRANGE things O.O
We’ll see in this tutorial on how to handle them ;)

By default PHP won’t handle accented characters, just like HTML does. So you’ll have to translate them in HTML equivalent codes. If you’re thinking “oh damn, I’ll have to handle all of them with infinite str_replace!” you’re wrong. PHP indeed has some functions that will help you doing this.

I firstly found this problem developing my WoWASDK classes, so I’ll take some examples from it.

URLEncode

This function allows you to convert the string into an URL-compatible one. This means substituting accented characters with %code. This will work if you want to create for example links (or to get an URL dynamically created)

// guild.wowasdk.php
class wowasdk_guild {
// cut out ...
	function __construct($region,$server,$name, $force_cache = false) {

		$server = ucfirst(strtolower($server));
		$fileurl = "guilds/$region/$server/$name.xml";
		$name = urlencode($name);

Do you see? $name = urlencode($name). This will assure you it will be URL-compatible. Not that hard :)

utf8_decode

This function will generate properly HTML characters. It’s a good way to print them off your screen in the web page.

// (unreleased) guild.wowasdk.php
// cut out ...
	function getMembersList($sort = GUILD_SORT_NULL) {
		$return = array();
		foreach($this->xmlsheet->guildInfo->guild->members->character as $char) {
			switch($this->region) {
				case EU:
					$url = "http://eu.wowarmory.com/character-sheet.xml?";
					break;
				case US:
					$url = "http://www.wowarmory.com/character-sheet.xml?";
					break;
			}
			$return[] = array(
				"name" => utf8_decode($char["name"]),
				"gender" => (int)$char["genderId"],
				"raceid" => (int)$char["raceId"],
				"classid" => (int)$char["classId"],
				"level" => (int)$char["level"],
				"rankid" => (int)$char["rank"],
				"url" => $url.$char["url"],
				"achPoints" => (int)$char["achPoints"],
			);
		}
// cut out ...

Line 15 of the example: “name” => utf8_decode($char["name"]). In this case you’ll be able to handle accented chars and correctly print them in an HTML page, otherwise you’ll see strange chars.

NOTE: if you don’t use these functions, you’ll for sure have problems even in pure PHP programs (no echo / print), this because it seems you have to “initialise” the strings before using them.

htmlspecialchars

At least here is the function which will translate special characters (like &, <, >, “) into HTML standard ones. It’s usefull if you ecounter problems viewing your site with strange characters.

<?php
	$string = "My <articulated> 'string'";
	$string = htmlspecialchars($string);
	echo $string;
?>

Will return:

My &lt;articulated&gt; 'string'