lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Resolved: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer
Date Wed, 08 Aug 2007 22:30:59 GMT


Michael McCandless resolved LUCENE-966.

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [New])

OK I committed this!  Thank you Stanislaw!

I ran a quick perf test on Wikipedia (first 50K docs only) and found
the new StandardTokenizer is ~6X faster -- awesome :)

I made these small additional changes over the final patch before

  * I removed StandardAnalyzer.html "grammar doc" generation from
    build.xml since it was using jjdoc.  Stanislaw, is there something
    in jflex that can generated a BNF description of the grammar as

  * I removed the @author tag from we are
    removing all such tags and instead giving credit in CHANGES.txt.

  * I removed the whitespace-only diffs from common-build.xml &

  * I put back the big comment that describes this tokenizer in

  * Put standard Apache copyright headers in all sources.

> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>                 Key: LUCENE-966
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Stanislaw Osinski
>             Fix For: 2.3
>         Attachments:, jflex-analyzer-patch.txt, jflex-analyzer-r560135-patch.txt,
jflex-analyzer-r561292-patch.txt, jflex-analyzer-r561693-compatibility.txt, jflex-analyzer-r562378-patch-nodup.txt,
> JFlex ( can be used to generate a faster (up to several times) replacement
for StandardAnalyzer. Will add a patch and a simple benchmark code in a while.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message