PHP simplexml Entities

What's going on here?

$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
    <album>
        <img src="002.jpg" caption="w&aacute;ssup?" />
    </album>
XML;

$xml = simplexml_load_string($string);
// $xmlobj = simplexml_load_file("xml.xml"); // same thing

echo "<pre>";
var_dump($xml);
echo "</pre>";

      

Error:

Warning: simplexml_load_string () [function.simplexml-load-string]: Entity: line 5: parser error: entity 'aacute' not defined

+2


a source to share


5 answers


&aacute

is not an XML object - you are thinking of HTML.

Special characters are commonly used "as is" in XML - html_entity_decode()

on the input (don't forget to specify UTF-8 as the character set) should do the trick:



$string = html_entity_decode($string, ENT_QUOTES, "utf-8");

      

+14


a source


I had this problem the other day. any occurrence and must be inside a CDATA tag

<album>
    <img src="002.jpg" />
    <caption><![CDATA[now you can put whatever characters you need & include html]]></caption>
</album> 

      



so that the parser doesn't crash.

+2


a source


You can see Matt Robinson 's article on Alternative Method: Converting Named Objects to Numbers in PHP . It mentions the method html_entity_decode

(already pointed out by another answer) and some potential bugs:

There are two possible problems with this approach. The first one is invalid objects: html_entity_decode()

won't touch them, which means you'll get XML errors anyway. The second is coding. I suppose you really don't want to UTF-8

. You should, because it's awesome, but maybe you have a good reason. If you don't tell to html_entity_decode()

use UTF-8

, it won't convert objects that don't exist in the character set you specify. If you specify it to output to UTF-8 and then use something like iconv()

to convert it, you will lose any characters that are not in the output encoding.

Also, if you find the script to be quite cumbersome, you can also use the one used for SourceRally .

+2


a source


Another solution is to change

"w&aacute;ssup?" to "w&amp;aacute;ssup?"

+1


a source


Try this function simplexml_load_entity_string

<?php

$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
    <album>
        <img src="002.jpg" caption="test&lt;w&aacute;ssup?" />
    </album>
XML;

$xml = simplexml_load_entity_string($string);

var_dump($xml);

function simplexml_load_entity_string($string = '')
{
    // cover entity except Predefined entities in XML
    $string = str_replace([
        '&quot;', '&amp;', '&apos;', '&lt;', '&gt;',
    ], [
        'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
    ], $string);
    $string = html_entity_decode($string, ENT_QUOTES, "utf-8");
    $string = str_replace([
        'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
    ], [
        '&quot;', '&amp;', '&apos;', '&lt;', '&gt;',
    ], $string);

    // load xml
    return simplexml_load_string($string);
}

      

0


a source







All Articles