[PHP] Fix for UTF-8 characters when using file_get_contents() for sites in ISO-8859-1 charset
During my work, when I used Simple HTML DOM Parser, I get noticed that my code was broken when it parses sites with ISO-8859-1 encoding...
I received such string:
HKM Lederreitstiefel CROCO Langl�nge/enge Weite
During googling I found solution:
mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true))
It fixed previous case:
HKM Lederreitstiefel CROCO Langlänge/enge Weite
but it breaks another one:
["currency"] => string(2) ""
Finally I changed it to:
mb_convert_encoding(
mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)),
'HTML-ENTITIES',
'UTF-8'
)
And received such strings:
Kardätsche -Flower- für Kinder
It is better, but not good one. Simply adding html_entity_decode
function in correct places and we finally received:
Kardätsche -Flower- für Kinder
Such solution rescued my project for parsing websites in different encoding.
I hope that it will help somebody. Have a nice day :-)
P.S. Many thanks for guys here: https://stackoverflow.com/questions/2236668/file-get-contents-breaks-up-utf-8-characters
Run @cleverbot and @originalworks