lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer
Date Thu, 02 Aug 2007 21:18:35 GMT
>
>
>
> Mark -- have you tried the jflex-analyzer-r560135-patch.txt patch with your wikipedia
diff test? That's the early one whose grammar was "dot for dot" translated from the original
JavaCC spec -- for further patches I did some "optimizations", which seem to have broken the
compatibility...
>   
The test is Mike's and I think it is off your latest patch. Looks like 
the optimizations might have to go then?
> Incidentally, what was the motivation for requiring the <NUM> token to have numbers
only in every second segment and not in any segment?
>   
I don't think the rule is "every second segment" but "at least every 
other segment". Why this rule was made, I am not sure; I am guessing it 
was just a good rule of thumb to catch a lot of serial numbers, model 
numbers, etc but without going too overboard in the matching.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message