lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislaw Osinski" <>
Subject Re: StandardTokenizer is slowing down highlighting a lot
Date Wed, 25 Jul 2007 11:19:31 GMT
> Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
> limited by JavaCC speed. You cannot shave much more performance out of
> the grammar as it is already about as simple as it gets.

JavaCC is slow indeed. We used it for a while for Carrot2, but then (3 years
ago :) switched to JFlex, which for roughly the same grammar would sometimes
be up to 10x (!) faster. You can have a look at our JFlex specification at:

This one seems more complex than the StandardAnalyzer's but it's much faster

If anyone is interested, I could prepare a JFlex based Analyzer equivalent
(to the extent possible) to current StandardAnalyzer, which might offer nice
indexing and highlighting speed-ups.



Stanislaw Osinski,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message