lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: StandardTokenizer is slowing down highlighting a lot
Date Wed, 25 Jul 2007 11:29:12 GMT
I would be very interested. I have been playing around with Antlr to see 
if it is any faster than JavaCC, but haven't seen great gains in my 
simple tests. I had not considered trying JFlex.

I am sure a faster StandardAnalyzer would be greatly appreciated. 
StandardAnalyzer appears widely used and horrendously slow. Even better 
would be a StandardAnalyzer that could have different recognizers 
enabled/disabled. For example, dropping NUM recognition if you don't 
need it in the current StandardAnalyzer gains like 25% speed.

- Mark

Stanislaw Osinski wrote:
>> Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
>> limited by JavaCC speed. You cannot shave much more performance out of
>> the grammar as it is already about as simple as it gets.
> JavaCC is slow indeed. We used it for a while for Carrot2, but then (3 
> years
> ago :) switched to JFlex, which for roughly the same grammar would 
> sometimes
> be up to 10x (!) faster. You can have a look at our JFlex 
> specification at:

> This one seems more complex than the StandardAnalyzer's but it's much 
> faster
> anyway.
> If anyone is interested, I could prepare a JFlex based Analyzer 
> equivalent
> (to the extent possible) to current StandardAnalyzer, which might 
> offer nice
> indexing and highlighting speed-ups.
> Best,
> Staszek

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message