lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer
Date Thu, 02 Aug 2007 11:56:52 GMT


Michael McCandless commented on LUCENE-966:

If it really is down to emulating the bugs/oddities in JavaCC then I
think it's not worth polluting the new tokenizer with these legacy
bugs, unless one or two cases can match perfectly and not degrade
performance too badly?

And maybe what we should do is make this a new tokenizer, calling it
StandardAnalyzer2, and then deprecate the existing StandardAnalyzer?
Then remove any & all JavaCC bug emulation from the new one.

This way people relying on the precise bugs in JavaCC tokenization are
not hurt on upgrading to 2.3 and are given a chance to migrate to the
new one (with 1 release of deprecated StandardAnalyzer).  And new
people will use the new faster one.

> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>                 Key: LUCENE-966
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Stanislaw Osinski
>             Fix For: 2.3
>         Attachments:, jflex-analyzer-patch.txt, jflex-analyzer-r560135-patch.txt,
jflex-analyzer-r561292-patch.txt, jflex-analyzer-r561693-compatibility.txt
> JFlex ( can be used to generate a faster (up to several times) replacement
for StandardAnalyzer. Will add a patch and a simple benchmark code in a while.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message