UTF-8 xml file shows Gibberish
I have a UTF-8 encoded XML file that has been exported from a Wordpress MySQL database.
As long as the file is saved as UTF-8 and the encoding is UTF-8, I get gibberish instead of the Hebrew text that should be there, which looks like this:
™ × • × × ~ • × ª
How do I find the original encoding or encoding and convert the text to correct Hebrew?
PHP mb_detect_encoding ($ str); returns UTF-8
Tried all sorts of php encoding functions, with different settings and I / O encodings, but they all just print different looking blocks of gibberish, like this:
ÃâÃËÃâ ¢ ¢ Ä AEA
and
×× © × ž ×
... Any ideas how to do this?
a source to share
function convert($str) {
$hebrew = array("א", "ב", "ג", "ד", "ה", "ו", "ז", "ח", "ט", "י", "כ", "ל", "מ", "נ", "ס", "ע", "פ", "צ", "ק", "ר", "ש", "ת", "ך", "ם", "ן", "ף", "ץ");
$gibberish = array("à", "á", "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ë", "ì", "î", "ð", "ñ", "ò", "ô", "ö", "÷", "ø", "ù", "ú", "ê", "í", "ï", "ó", "õ");
return str_replace($gibberish, $hebrew, $str);
}
$hebrew_string = convert(utf8_encode($gibberish_string));
a source to share
This is very similar to this question .
From what I could see, this is a garbled Unicode string where each Unicode character is encoded as two Unicode characters.
The code I entered simply discarded the empty high byte and restored the original byte array. The code is just an example and is very simplistic in approach but should help you get there.
a source to share