Česky
Kamil Dudka

CharsetDetector (PHP)

CharsetDetector/files/charset.jpeg

CharsetDetector is an PHP component. It detects text charset automatically. The detection is based on static analyse of input text. According to detected charset, the text can be automatically recoded to requested charset. There is the iconv PHP extension required to recode text, but this is not needed for charset detection.

The component has simple interface and it is easy to use. It was tested on pieces of Czech text using charset ISO 8859-2, Windows 1250 and UTF-8. There was 100% hit ratio while using uniform charset in the text.

Of course CharsetDetector is not limited to any charset or language. Detection rules can be set arbitrarily and even in run-time. You can find simple example of CharsetDetector usage in short tutorial:

Source code

Documentation

Acknowledgement

Otakar Pinkas reported a bug that had been breaking initialization of the CharsetDetector class in case a non-default argument was given to its constructor.