lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: 2.3.2 Indexing Performance
Date Wed, 01 Oct 2008 19:17:02 GMT

Awesome!  Thanks for following up.

Mike

Gary Moore wrote:

> Finally got back to this.  The great bulk of the time is spent  
> parsing/tokenizing.  So, using 10 threads parsing/analyzing the 4.5M  
> docs and feeding them to an IndexWriter took 106 minutes including a  
> final optimization.   The index is 5.6 GB.   I'm tempted to try  
> multiple indexing threads but my guess is it won't buy that much  
> since the async writer more than kept up with the thread queue.
>
> Now, I'm even more impressed with 2.3!
> -Gary
> Michael McCandless wrote:
>>
>> Thanks for the data point!
>>
>> This is expected -- alot of work went into increasing IndexWriter's  
>> throughput in 2.3.
>>
>> Actually, I'd expect even more speedup, if indeed Lucene is the  
>> bottleneck in your app.  You could test how much time just creating/ 
>> parsing & tokenizing the docs (from whatever is holding them)  
>> takes, to see.  Also you might eke more performance out following  
>> the suggestions here:
>>
>>    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>>
>> Since you've got 4 CPUs and lots of RAM you should definitely use  
>> multiple indexing threads with a large RAM buffer.
>>
>> Mike
>>
>> Gary Moore wrote:
>>
>>> Parsing and indexing 4.5 million MARC/XML bibliographic records  
>>> was requiring ~14 hrs. using 2.2.  The same job using 2.3 takes ~  
>>> 5 hrs. on the same platform --  a quad processor Sun V440 w/8GB  
>>> memory.   I'm using the PerFieldAnalyzerWrapper (StandardAnalyzer  
>>> and SnowballAnalyzer).
>>>
>>> I'm impressed!  Is this typical?
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message