From ooo-dev-return-11465-apmail-incubator-ooo-dev-archive=incubator.apache.org@incubator.apache.org Tue Jan 3 18:14:07 2012 Return-Path: X-Original-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E43E29DD3 for ; Tue, 3 Jan 2012 18:14:06 +0000 (UTC) Received: (qmail 7979 invoked by uid 500); 3 Jan 2012 18:14:06 -0000 Delivered-To: apmail-incubator-ooo-dev-archive@incubator.apache.org Received: (qmail 7916 invoked by uid 500); 3 Jan 2012 18:14:06 -0000 Mailing-List: contact ooo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ooo-dev@incubator.apache.org Delivered-To: mailing list ooo-dev@incubator.apache.org Received: (qmail 7908 invoked by uid 99); 3 Jan 2012 18:14:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2012 18:14:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rgb.mldc@gmail.com designates 209.85.160.175 as permitted sender) Received: from [209.85.160.175] (HELO mail-gy0-f175.google.com) (209.85.160.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2012 18:14:00 +0000 Received: by ghrr17 with SMTP id r17so6994637ghr.6 for ; Tue, 03 Jan 2012 10:13:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LhTiBdAHG8yJAw00w+PL3xUhEsDu9FRA0PKaDQuqWHo=; b=xKHCHmjVK1MF6MGKhyA1297EYJg95GIpd1D4N6kFH5WiMtg34mC66X4oEmsYZhXaii +R2wjyPNE8zYZB3GtIoxM/VHDNdn4fuUbOnEbHW2kzbTK5bJgQjyrRnRqfohSL69ybDB i8cLSaLFJcBMmqg290H3I/ZoziHbgeXiED32w= MIME-Version: 1.0 Received: by 10.236.193.70 with SMTP id j46mr67208812yhn.108.1325614419402; Tue, 03 Jan 2012 10:13:39 -0800 (PST) Received: by 10.236.127.145 with HTTP; Tue, 3 Jan 2012 10:13:39 -0800 (PST) In-Reply-To: <4E85BF42.2020105@apache.org> References: <4E85BF42.2020105@apache.org> Date: Tue, 3 Jan 2012 19:13:39 +0100 Message-ID: Subject: Re: i18nregexp replaced with ICU regexp => heads up From: RGB ES To: ooo-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=20cf30563c2193de0104b5a3a8b2 --20cf30563c2193de0104b5a3a8b2 Content-Type: text/plain; charset=ISO-8859-1 2011/9/30 Herbert Duerr > Hi, > > for removing "category X excluded licenses" from Apache OpenOffice I > replaced the formerly used LGPL licensed module i18nregexp with the regular > expression engine of module ICU which is already widely use in OpenOffice. > > The replacement fixes a lot of problems: e.g. in a text "abcabc" trying to > "find all backwards" for "b" resulted in it only finding the last "b", now > it actually finds all of them. It also introduces some changes, e.g. > i18nregexp had two modes "classic" and "extended" regexp whereas the ICU > based engine treats all patterns as extended-regexp. > > I18nregexp used an approach where it transliterated and compared each > codepoint pair of the pattern and text string. The new engine does the > transliteration only once per pattern and text string. This is much faster, > but it only works because the transliteration was tweaked to preserve the > special regexp control characters. > > The reporters of any issues in the lists below are encouraged to check the > problems they saw with the new engine. > https://issues.apache.org/ooo/**buglist.cgi?quicksearch=regexp > https://issues.apache.org/ooo/**buglist.cgi?quicksearch=**regular\expression > Please make sure to have the "More Options -> Regular Expressions" > checkbox activated for testing. > > I'm afraid the regexp replacement resulted in changes mostly for Japanese > users, because there a lot of non-trivial transliterations are active. For > reference I'm enumerating the active rules: "ProlongedSoundMark", > "IterationMark", "Ignore-Width", "BaFa", "SeZe", "HyuByu", > "IandEfollowedByYa" and "KiKuFollowedBySa". > > Herbert > Sorry for reactivating this old thread, but I have a question about the new regexp engine: it seems that some regular expressions do not work any more on AOO test builds. For example, on OOo 3.3 you can use \<[0-9]+[,|\.][0-9]*\> to find decimal numbers no matter if the decimal separator is a colon or a dot (the expression will find 125.25 and 1253,586) but this expression do not work on AOO builds. Are there changes on the regexp syntax? If yes, where are those changes documented? Thanks in advance Ricardo --20cf30563c2193de0104b5a3a8b2--