PHP simplexml Entities
What's going on here?
$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<album>
<img src="002.jpg" caption="wássup?" />
</album>
XML;
$xml = simplexml_load_string($string);
// $xmlobj = simplexml_load_file("xml.xml"); // same thing
echo "<pre>";
var_dump($xml);
echo "</pre>";
Error:
Warning: simplexml_load_string () [function.simplexml-load-string]: Entity: line 5: parser error: entity 'aacute' not defined
a source to share
á
is not an XML object - you are thinking of HTML.
Special characters are commonly used "as is" in XML - html_entity_decode()
on the input (don't forget to specify UTF-8 as the character set) should do the trick:
$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
a source to share
You can see Matt Robinson 's article on Alternative Method: Converting Named Objects to Numbers in PHP . It mentions the method html_entity_decode
(already pointed out by another answer) and some potential bugs:
There are two possible problems with this approach. The first one is invalid objects:
html_entity_decode()
won't touch them, which means you'll get XML errors anyway. The second is coding. I suppose you really don't want toUTF-8
. You should, because it's awesome, but maybe you have a good reason. If you don't tell tohtml_entity_decode()
useUTF-8
, it won't convert objects that don't exist in the character set you specify. If you specify it to output to UTF-8 and then use something likeiconv()
to convert it, you will lose any characters that are not in the output encoding.
Also, if you find the script to be quite cumbersome, you can also use the one used for SourceRally .
a source to share
Try this function simplexml_load_entity_string
<?php
$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<album>
<img src="002.jpg" caption="test<wássup?" />
</album>
XML;
$xml = simplexml_load_entity_string($string);
var_dump($xml);
function simplexml_load_entity_string($string = '')
{
// cover entity except Predefined entities in XML
$string = str_replace([
'"', '&', ''', '<', '>',
], [
'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
], $string);
$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
$string = str_replace([
'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
], [
'"', '&', ''', '<', '>',
], $string);
// load xml
return simplexml_load_string($string);
}
a source to share