harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolay Kuznetsov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow
Date Thu, 22 Jun 2006 16:27:39 GMT
     [ http://issues.apache.org/jira/browse/HARMONY-580?page=all ]

Nikolay Kuznetsov updated HARMONY-580:
--------------------------------------

    Attachment: perf.patch

Attached patch is a partial (since we are not faster in this test) relieve to the problem
mentioned. The problem here is behaviour of lookbehind construct, wich tries to find specified
token startting from every symbol(during find) and till the begining of the string prior to
actual match.

In the suggested patch I've put look behind check after actual match. I believe that in most
of the cases this is fastest order of checks. 

Tim: do you have other perfomance data about harmony regex, if yes, could you please share
tests. As an author of this implementation I personally interested in perfomance improvement
:). 

My current results on 1.7 GHz Pentium M are:

-- on jrockit-jdk1.5.0 --
Search space length = 28522
Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
Took 20 Match not found
Took 10 Match not found
Took 10 Match not found
Took 20 Match not found
Took 10 Match not found
Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
DONE

-- on Harmony (working copy) --

Search space length = 28522
Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
Took 30 Match not found
Took 20 Match not found
Took 20 Match not found
Took 20 Match not found
Took 20 Match not found
Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
Took 30 Match not found
Took 30 Match not found
Took 30 Match not found
Took 30 Match not found
Took 30 Match not found
DONE  

> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>  Attachments: MatchingStuff.java, perf.patch
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that
Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^
.!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message