incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <>
Subject Re: i18nregexp replaced with ICU regexp => heads up
Date Wed, 04 Jan 2012 21:23:55 GMT
On Wed, Jan 4, 2012 at 3:31 PM, Dennis E. Hamilton
<> wrote:
> I think there are three concerns regarding the substitution of ICU regexp (a move that
I favor):
>  1. There are whatever the internal usages are within AOO.  Those don't need to be
documented externally.
>  2. There are the places where users may specify regular expressions, especially in
Calc and macro procedures.  For those, there needs to be precise documentation as well as
any indication of what the impact is in Release Notes.
>  3. There are the occurrences of regular expressions in the ODF documents that are consumed
and produced and that are covered in the ODF 1.0/1.1/1.2 specifications.  Excluding macros
(which are not covered in any version of ODF), these present potential interoperability issues
among producers and consumers of the same (e.g., OpenOffice) and different lineage.
>    Here there needs to be enough documentation of what ODF leaves discretionary so
that interoperability can be assured.

Hmmm... regular expressions are used in table filters, for example the
filter feature in Calc.  Are there any others that we should be
thinking about?

The nice thing is that ODF defines the syntax per Unicode Regular Expressions:

So by moving to ICU, OpenOffice might for the first time actually
conform to that part of the spec!

But you are correct that we need to make sure this is in the release
notes, so that users are not confused by this sudden conformance.

(BTW, is there a draft of the 3.4 release notes on a wiki or someplace
else we should be updating?)

> So there are these three cases, where perhaps the most urgent documentation is (2) and
then (3) because in the latter case, there may be unexpected or even un-noticed unexpected
results and users won't know what to do.  That will not encourage trust.

This is one example of hundreds. It caught your eye -- and that is a
good thing -- because it has an obvious relation with ODF.  But we
should also think about how we can get the all the other similar

For example, a test script that loads a bunch of legacy documents in
OOo 3.3.0, grabs a screen shot, saves as a bitmap. (Or save as PDF).
Repeat with 3.4 build.  Automate a diff the bitmaps or PDF's.  Flag
any differences for manual review.    You can be certain that vendors
who take version to version interop very seriously do something like
this.  I wonder how hard it would be to do something similar?  (Of
course, visual layout and appearance is only one aspect of interop,
but it is an important one)

>  - Dennis
> -----Original Message-----
> From: RGB ES []
> Sent: Wednesday, January 04, 2012 05:44
> To:
> Subject: Re: i18nregexp replaced with ICU regexp => heads up
> 2012/1/4 Herbert Duerr <>
>> On 03.01.2012 19:13, RGB ES wrote:
>>> Sorry for reactivating this old thread, but I have a question about the
>>> new
>>> regexp engine: it seems that some regular expressions do not work any more
>>> on AOO test builds. For example, on OOo 3.3 you can use
>>> \<[0-9]+[,|\.][0-9]*\>
>>> to find decimal numbers no matter if the decimal separator is a colon or a
>>> dot (the expression will find 125.25 and 1253,586) but this expression do
>>> not work on AOO builds.
>> Following up myself, AOO will both find strings like "<125.25>" and
>> "<1253,586>" for the example regexp you provided, but it will not find
>> "125.25" and "1253,586".
>> http://www.regular-****html<>mentions
the \< and \> that are used in your regexp example as non-standard
>> extensions to the syntax. With using the new engine we are closer to the
>> reference http://www.regular-****html<>
>> Herbert
> Thanks! I think we will need to make more clear to the users this change:
> many old documents with macros that rely on the old syntax will not work on
> AOO 3.4.
> Also, I think I found a problem. Suppose you have a text like "He heard
> quiet quiet steps". By using a regexp like
> (\w+) \1\b
> (notice the space before the \1) you'll find the repeated word without
> problems, but if you use $1 on "Replace with" instead of obtaining "He
> heard quiet steps" you get "He heard  steps", with two spaces between
> "heard" and "steps": the reference is not inserted!
> Ricardo

View raw message