CharsetDetector (PHP)
CharsetDetector
is an PHP component. It detects text charset automatically. The detection is based on static analyse of input text. According to detected charset, the text can be automatically recoded to requested charset. There is the iconv
PHP extension required to recode text, but this is not needed for charset detection.
The component has simple interface and it is easy to use. It was tested on pieces of Czech text using charset ISO 8859-2
, Windows 1250
and UTF-8
. There was 100% hit ratio while using uniform charset in the text.
Of course CharsetDetector
is not limited to any charset or language. Detection rules can be set arbitrarily and even in run-time. You can find simple example of CharsetDetector
usage in short tutorial:
Source code
Documentation
Acknowledgement
Otakar Pinkas reported a bug that had been breaking initialization of the CharsetDetector class in case a non-default argument was given to its constructor.