lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: StandardTokenizer is slowing down highlighting a lot
Date Wed, 25 Jul 2007 11:53:05 GMT

On Jul 25, 2007, at 7:19 AM, Stanislaw Osinski wrote:

>>
>> Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
>> limited by JavaCC speed. You cannot shave much more performance  
>> out of
>> the grammar as it is already about as simple as it gets.
>
>
> JavaCC is slow indeed. We used it for a while for Carrot2, but then  
> (3 years
> ago :) switched to JFlex, which for roughly the same grammar would  
> sometimes
> be up to 10x (!) faster. You can have a look at our JFlex  
> specification at:
>
> http://carrot2.svn.sourceforge.net/viewvc/carrot2/trunk/carrot2/ 
> components/carrot2-util-tokenizer/src/org/carrot2/util/tokenizer/ 
> parser/jflex/JFlexWordBasedParserImpl.jflex?view=markup
>
> This one seems more complex than the StandardAnalyzer's but it's  
> much faster
> anyway.
>
> If anyone is interested, I could prepare a JFlex based Analyzer  
> equivalent
> (to the extent possible) to current StandardAnalyzer, which might  
> offer nice
> indexing and highlighting speed-ups.

+1.  I think a lot of people would be interested in a faster  
StandardAnalyzer.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message