lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislaw Osinski" <stanislaw.osin...@man.poznan.pl>
Subject Re: StandardTokenizer is slowing down highlighting a lot
Date Wed, 25 Jul 2007 11:19:31 GMT
>
> Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
> limited by JavaCC speed. You cannot shave much more performance out of
> the grammar as it is already about as simple as it gets.


JavaCC is slow indeed. We used it for a while for Carrot2, but then (3 years
ago :) switched to JFlex, which for roughly the same grammar would sometimes
be up to 10x (!) faster. You can have a look at our JFlex specification at:

http://carrot2.svn.sourceforge.net/viewvc/carrot2/trunk/carrot2/components/carrot2-util-tokenizer/src/org/carrot2/util/tokenizer/parser/jflex/JFlexWordBasedParserImpl.jflex?view=markup

This one seems more complex than the StandardAnalyzer's but it's much faster
anyway.

If anyone is interested, I could prepare a JFlex based Analyzer equivalent
(to the extent possible) to current StandardAnalyzer, which might offer nice
indexing and highlighting speed-ups.

Best,

Staszek

-- 
Stanislaw Osinski, stanislaw.osinski@carrot-search.com
http://www.carrot-search.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message