lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer
Date Thu, 02 Aug 2007 19:32:52 GMT


Mark Miller commented on LUCENE-966:

These issues seem odd.

Both JavaCC and Flex match with the same rules:
1. Longest match first
2. If match size is the same, use the first in the grammar



The old is correct and the NEW should not match <NUM>. <NUM> should break on '/'
and '.' and every other token from the break should have a digit for a NUM match to occur.
This is not the case.



Something is wrong with the NEW one. <NUM> is certainly a valid longer match.



Again, something seems wrong with the NEW.  (safari-0-sheikh,12011,12026,type=<NUM>)
is a correct and longer match than  (safari,12011,12017,type=<ALPHANUM>)

It would be nice to have the source text for these comparisons.

Also, a hard vote against StandardAnalyzer2 <g> Default is arguable as well, as this
wouldn't be the default analyzer you should use in many cases (don't like standard because
of that either).

>From the latest samples, I would say something is off with the NEW and OLD appears mostly

- Mark

> A faster JFlex-based replacement for StandardAnalyzer
> -----------------------------------------------------
>                 Key: LUCENE-966
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Stanislaw Osinski
>             Fix For: 2.3
>         Attachments:, jflex-analyzer-patch.txt, jflex-analyzer-r560135-patch.txt,
jflex-analyzer-r561292-patch.txt, jflex-analyzer-r561693-compatibility.txt
> JFlex ( can be used to generate a faster (up to several times) replacement
for StandardAnalyzer. Will add a patch and a simple benchmark code in a while.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message