lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: performance feedback
Date Wed, 09 Jul 2008 14:37:53 GMT

This is great to hear!

If you tweak things a bit (increase RAM buffer size, use  
autoCommit=false, use threads, etc) you should be able to eke out some  
more gains...

Are you storing fields & using term vectors on any of your fields?

Mike

Beard, Brian wrote:

>
> I just did an update from lucene 2.2.0 to 2.3.2 and thought I'd give
> some kudos for the indexing performance enhancements.
>
> The lucene indexing portion is about 6-8 times faster. Previously we
> were doing ~60-120 documents per second, now we're between 400-1000,
> depending on the type of document, size, and how many fields there  
> are.
> Haven't done a formal comparison side by side, but certainly is
> substantially faster.
>
> The gain would have been equal to the 8-10 times in the readme, but
> using custom tokenizers slows things down a little vs. using the
> standard one. At first I didn't realize to use reusableTokenFilter  
> which
> bypassed the custom tokenizers and had the 8-10x improvement. Maybe
> there's some more gain to be had I can pursue.
>
> We index from 5 to 20 fields per document (in serial through
> IndexWriter). Most are 3-5K total size, but can vary quite a bit.  
> Total
> index size (eventually) will be ~15G.
>
> Thanks,
>
> Brian Beard
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message