lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <>
Subject Re: Fastest batch indexing with 1.3-rc1
Date Wed, 20 Aug 2003 22:18:45 GMT
Isn't it better for Dan to skip the optimization phase before merging? I 
am not sure, but he could save some time on this (if he has enough file 
handles for that, of course). What strategy do you use in "nutch"?



Doug Cutting wrote:

> As the index grows, disk i/o becomes the bottleneck.  The default 
> indexing parameters do a pretty good job of optimizing this.  But if 
> you have lots of CPUs and lots of disks, you might try building 
> several indexes in parallel, each containing a subset of the 
> documents, optimize each index and finally merge them all into a 
> single index at the end. But you need lots of i/o capacity for this to 
> pay off.
> Doug
> Dan Quaroni wrote:
>> Looks like I spoke too soon... As the index gets larger, time to merge
>> becomes prohibitably high.  It appears to increase linearly.
>> Oh well.  I guess I'll just have to go with about 3ms/doc.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message