Improving support for non-latin languages
This year Mapnik was not selected as a Google Summer of Code Project, but as Mapnik is OpenStreetMap’s main renderer I was selected to work on Mapnik text rendering by the OpenStreetMap project.
Current state of Mapnik’s text rendering
Mapnik was initially designed with only left-to-right text (i.e. like this text) in mind. Later support for basic rendering of right-to-left (RTL) languages was added. But the code is complicated and there are a lot of problems with it.
Currently rendering is done by reordering RTL text to LTR display order, but this step happens early in the rendering process before trying to find a placement and before finding line breaks. To illustrate why this is bad I will describe two problems that are actually quite difficult to solve:
After reordering the text it is stored backwards in memory.
THIS IS AN EXAMPLE becomes
ELPMAXE NA SI SIHT. When we try to find line breaks, in the LTR example it might happen between
AN. But if we break the reversed text and render the first part first we end up with
ELPMAXE NA SI SIHT
This is of course wrong because the first word is placed on the second line. There is a work around in place in mapnik which places the last line first when RTL text is detected, but this code doesn’t work with mixed RTL and LTR text. The correct solution would be to perform line breaking first and then reverse the text. This is not possible with the current rendering stack in Mapnik.
Mapnik’s placement finder is able to detect when more then 50% of all characters are placed upside down on a line of text and then decides it should flip all characters. When it does this it processes all characters in reverse order. This works well for most scripts but sometimes there is more than a single character per glyph. Now when you reverse the processing order the rendering system can’t combine them to the right glyph. This leads to distorted, unreadable rendering.
What’s going to happen?
I will rewrite most of the placement finder code and other parts of the rendering stack to make it more robust when handling RTL text. I also will try to fix as many bugs as possible. I will try to keep the code readable and maintainable. If possible most work should be done by a specialized external library. Our own code can never be as good as a specialized library because text rendering is only a small part of Mapnik and so we can’t spend the same amount of work on it as developers of Unicode libraries can.
What has been done so far?
I investigated different options as to which library to use. The results are documented on the OSM-Project page.
My conclusion is that Pango is the best solution, as it does most of the work necessary to get a good rendering:
- Line breaking
- Dealing with different formatting options, fonts, etc.
- Shaping using HarfBuzz (selecting which Glyph to use for a character depending on font, language and context. e.g. a different glyph might be used for the same character depending on where in the word it appears)
- Handles text rendered from top to bottom
I need help from the OpenStreetMap and Mapnik community to provide examples of where non-latin text is rendered incorrectly. It would be good if a description of what is wrong could be provided too. Thank you in advance.Posted by Hermann Kraus on 29 May 2012.