lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <Le...@seznam.cz>
Subject Re: Fastest batch indexing with 1.3-rc1
Date Wed, 20 Aug 2003 22:18:45 GMT
Isn't it better for Dan to skip the optimization phase before merging? I 
am not sure, but he could save some time on this (if he has enough file 
handles for that, of course). What strategy do you use in "nutch"?

THX

-g-


Doug Cutting wrote:

> As the index grows, disk i/o becomes the bottleneck.  The default 
> indexing parameters do a pretty good job of optimizing this.  But if 
> you have lots of CPUs and lots of disks, you might try building 
> several indexes in parallel, each containing a subset of the 
> documents, optimize each index and finally merge them all into a 
> single index at the end. But you need lots of i/o capacity for this to 
> pay off.
>
> Doug
>
> Dan Quaroni wrote:
>
>> Looks like I spoke too soon... As the index gets larger, time to merge
>> becomes prohibitably high.  It appears to increase linearly.
>>
>> Oh well.  I guess I'll just have to go with about 3ms/doc.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



Mime
View raw message