incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Herbert Duerr <...@apache.org>
Subject Re: i18nregexp replaced with ICU regexp => heads up
Date Mon, 23 Jan 2012 14:01:32 GMT
Replying to myself to give a status update.

> FYI, I'm also considering to extend the ICU regexp matcher with the
> bracket extension mentioned above if there is a chance that it gets into
> upstream ICU.

For the convenience of a smooth upgrade experience I committed r1234777 
so that AOO now emulates the \< and \> expressions by mapping them to \b 
for matching word boundaries.

Using ICU's powerful regular expression engine has quite some benefits 
for conforming to the Unicode standard. E.g. finding word boundaries now 
respects UAX#29 (Unicode Standard Annex #29: Text Segmentation) whereas 
the old engine used a rather simple heuristic. As an example it couldn't 
find word boundaries in scripts like Thai that don't use spaces or 
punctuation symbols for separating words.

I'll update the release notes accordingly.

Herbert

Mime
View raw message