Mail ISO-8859-2 character sets

We recently received a report that the email fetching feature within our Joomla Issue Tracker component wasn’t handling the subject header and email body correctly for the ISO-8859-2 character set. This character set is used by a number of Eastern European countries, so we were interested in resolving the problem if we possibly could.

We tend to use the standard PHP imap routines and it was immediately obvious how we should handle the subject, but implementing a call to the imap_mime_header_decode method. This worked well and was a very quick fix.

Unfortunately finding a solution for the email body was not so simple. An extensive search on the web turned up nothing of immediate use, and it was almost as if the topic was not relevant. After much searching we discovered that the character set was stored in the mail structure as a parameter, so we could write some code to extract this. The structure being available from the imap_fetchstructure method, we could perform a simple loop through each ‘attribute’ looking for the ‘charset’ and then extracting its ‘value’ which in our case was ‘ISO-8859-2’.

The next step was to decide how we could use this information. We were already handling the various encodings and we could see (from tests) that our emails had an encoding of type ‘4’ which represented the type known as ‘QUOTED-PRINTABLE’. Looking at the output from the imap_qprint method was obviously wrong so we had identified where we had to do some work. To cut a long story short we inserted a call to the ‘mb_convert_encoding’ method to change the output to ‘UTF-8’ which immediately resulted in sensible text output.

We next performed some clean up on the output, such as removing multiple blank lines in the output etc. to produce a ‘version’ of the email body that we could then use for saving in our database.

Will this handle other character sets? Well I do not know since one has to have examples to be able test against, and there have been no requests so far. At least it does provide a starting point if there is ever the need.

The whole exercise was interesting and not too taxing. The only real surprise was that none else had apparently had a similar problem. In the process we also learnt a lot about Eastern European character sets which will.may prove useful if/when we implement multi-language in the component, something that has been requested, but which we have not yet got around to investigating in depth.