Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 38515 invoked from network); 4 Sep 2007 10:08:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2007 10:08:22 -0000 Received: (qmail 10782 invoked by uid 500); 4 Sep 2007 10:08:15 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 10686 invoked by uid 500); 4 Sep 2007 10:08:15 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 10677 invoked by uid 99); 4 Sep 2007 10:08:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 03:08:15 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of liyilei1979@gmail.com designates 209.85.128.187 as permitted sender) Received: from [209.85.128.187] (HELO fk-out-0910.google.com) (209.85.128.187) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 10:09:21 +0000 Received: by fk-out-0910.google.com with SMTP id 18so1657990fks for ; Tue, 04 Sep 2007 03:07:47 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Us0CtWyX87bs4wRQFguyvRyo07qdEbpRJ6ZqVwCVhkLKjnCkN+eD39cHeIyN1AezwyzUSCdAjpFu4IRHYB8tyJABrR6WMXFfIXOZDihBTgUjl3qd0F6em+E1m7mnmx/Fy1jE8HJKqx26sNAJQfrSHLeUNvcNraGHdZTCxsRoNCg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=GxlOY4JhDurN21IIGOFDGeDxPYPFUMniIWozfpappxuYq1IGM8cDz5W2DF0UdW5qaJy7nUFmPDGhSJh37Mh35ZmBASBraqowZl7dbDv5CCTeLYKmgUu+8iwlc5uSoQ6RnBtreYz6CeYFsZdkPhX/yqfc5x3TetXYHiSfcF3NcJc= Received: by 10.82.186.5 with SMTP id j5mr4135374buf.1188900466259; Tue, 04 Sep 2007 03:07:46 -0700 (PDT) Received: by 10.82.159.3 with HTTP; Tue, 4 Sep 2007 03:07:46 -0700 (PDT) Message-ID: Date: Tue, 4 Sep 2007 18:07:46 +0800 From: "Leo Li" To: dev@harmony.apache.org Subject: Re: [classlib][regex] Performance of regular expression is 30% of RI In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <5c8e69f0709040250q18bcb9a2h47cfa8552dacedae@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org On 9/4/07, Yang Paulex wrote: > 2007/9/4, Jimmy,Jing Lv : > > > > Hi, > > > > A micro benchmark[1] on regex shows our implementation is much > > slower than RI, it cost Harmony ~23 seconds while RI ~7 seconds to > > complete the test. I use windows XP, drlvm and Harmony M2 build, and > > then replace regex classes with RI's while testing RI. > > > > Regular expression is very improtant and core classlib for > > Harmony, make it performs well will do benefit to Harmony performance. > > We may find some way to make it faster. > > > > According to Spec, regex use NFA algorithm[2] which is a little > > different with perl regular expression. As a result, we can not change > > the basic algorithm to DFA to improve performance. However still > > there's alternative algorithm of NFA. I googled open source regular > > experssion implementation, it is said that PCRE[3](Perl Compatible > > Regular Expressions) algorithm is of very high performance and luckily > > it is BSD licensed! > > However the source is so large and C-written, and a quick glance > > at complie function I find it rather complex, And java regular > > expression is different from perl's so that a directly wrapping will > > be surely un-workable. > > > Agreed, I don't think it's a good idea to implement regex in JNI, the > Java-native bridge penalty may be too high for regex. > > > But we still may read its implementation and > > find out some key point to improve, or some one already has some idea > > on tuning? > > > It would be interesting to know the hotspot of our current implementations. The interesting thing is that even Pattern.compile("a"), such an simple pattern, cost more than 2 times on harmony than on RI. And I have traced the steps of the case, there is no loop, recursive or so on. Seems it is just we do more than RI. I am trying to make further study by profiling. > > > I'm not a regular expression expert, please correct me if I miss > > something. Any comments/suggestions? > > > > [1] > > public void test_Benchmark() throws Exception { > > // warn up phase > > for (int i = 0; i < 10000; i++) { > > Pattern p = Pattern.compile > > ("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z"); > > Matcher m = p.matcher("aaaaab"); > > m.matches(); > > } > > > > // benchmark start > > long time = System.currentTimeMillis(); > > for (int i = 0; i < 1000000; i++) { > > Pattern p = Pattern.compile > > ("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z"); > > Matcher m = p.matcher("aaaaab"); > > m.matches(); > > } > > System.out.println(System.currentTimeMillis() - time); > > } > > > > [2] file:///C:/Spec/docs/api/java/util/regex/Pattern.html > > [3] http://www.pcre.org/ > > -- > > > > Best Regards! > > > > Jimmy, Jing Lv > > China Software Development Lab, IBM > > > > > > -- > Paulex Yang > China Software Development laboratory > IBM > -- Leo Li China Software Development Lab, IBM