[PHP] Fix for UTF-8 characters when using file_get_contents() for sites in ISO-8859-1 charset

zavz9t (50)in #php • 8 years ago

During my work, when I used Simple HTML DOM Parser, I get noticed that my code was broken when it parses sites with ISO-8859-1 encoding...

I received such string:

HKM Lederreitstiefel CROCO Langl�nge/enge Weite

During googling I found solution:

mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true))

It fixed previous case:

HKM Lederreitstiefel CROCO Langlänge/enge Weite

but it breaks another one:

["currency"] => string(2) ""

Finally I changed it to:

mb_convert_encoding(
    mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)),
    'HTML-ENTITIES',
    'UTF-8'
)

And received such strings:

Kard&auml;tsche -Flower- f&uuml;r Kinder

It is better, but not good one. Simply adding html_entity_decode function in correct places and we finally received:

Kardätsche -Flower- für Kinder

Such solution rescued my project for parsing websites in different encoding.

I hope that it will help somebody. Have a nice day :-)

P.S. Many thanks for guys here: https://stackoverflow.com/questions/2236668/file-get-contents-breaks-up-utf-8-characters

#blog #programming #charset #ua

8 years ago in #php by zavz9t (50)

Sort:

v-mi (36) 8 years ago

Run @cleverbot and @originalworks

$0.00

STEEM 0.12

TRX 0.33

JST 0.032

BTC 109154.00

ETH 3929.53

USDT 1.00

SBD 0.86

[PHP] Fix for UTF-8 characters when using file_get_contents() for sites in ISO-8859-1 charset

Finally I changed it to:

I hope that it will help somebody. Have a nice day :-)

Coin Marketplace