lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: Post mortem kudos for (LUCENE-843) :)
Date Thu, 12 Jul 2007 19:22:48 GMT

Thank you for the compliments, and thank you for being such early
adopter testers!  I'm very glad you didn't hit any issues :)

> before LUCENE-843 indexing speed was 5-6k records per second (and I
> believed this was already as fast as it gets)
> after (trunk version yesterday) 60-65k documents per second! All
> (exhaustive!) tests pass on this index.

Wow, 10X speedup is even faster than my fastest results!

> autocommit = false, 24M RAMBuffer, using char[] instead of String
> for Token (this was the reason we separated Analysis in two phases,
> leaving for Lucene Analyzer only simple whitespace tokenization)

Looks like you're doing everything right to get fastest performance.

You can also re-use the Document & Field instances, and also the Token
instance in your analyzer and that should help more.

Was 24M (and not more) clearly the fastest performance?

Also note that you must workaround LUCENE-845 (still open):

  http://issues.apache.org/jira/browse/LUCENE-845;jsessionid=E110C767DA8EFEC5B7D39D00EEF1EB74

You should set your maxBufferedDocs to something "close to but always
above" how many docs actually get flushed after 24 MB RAM is full,
else you could spend way too much time merging.  I'm working on
LUCENE-845 now but not yet sure when it will be resolved...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message