lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: 2.3.2 Indexing Performance
Date Fri, 08 Aug 2008 23:36:21 GMT

Thanks for the data point!

This is expected -- alot of work went into increasing IndexWriter's  
throughput in 2.3.

Actually, I'd expect even more speedup, if indeed Lucene is the  
bottleneck in your app.  You could test how much time just creating/ 
parsing & tokenizing the docs (from whatever is holding them) takes,  
to see.  Also you might eke more performance out following the  
suggestions here:

     http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Since you've got 4 CPUs and lots of RAM you should definitely use  
multiple indexing threads with a large RAM buffer.

Mike

Gary Moore wrote:

> Parsing and indexing 4.5 million MARC/XML bibliographic records was  
> requiring ~14 hrs. using 2.2.  The same job using 2.3 takes ~ 5 hrs.  
> on the same platform --  a quad processor Sun V440 w/8GB memory.    
> I'm using the PerFieldAnalyzerWrapper (StandardAnalyzer and  
> SnowballAnalyzer).
>
> I'm impressed!  Is this typical?
>
> Gary Moore
> gary@littlebunch.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message