lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Smith <>
Subject Re: Best Practices for Distributing Lucene Indexing and Searching
Date Fri, 15 Jul 2005 06:05:31 GMT

On 15/07/2005, at 3:57 PM, Otis Gospodnetic wrote:

> The problem that I saw (from your email only) with the "ship the full
> little index to the Queen" approach is that, from what I understand,
> you eventually do addIndexes(Directory[]) in there, and as this
> optimizes things in the end, this means your whole index gets
> re-written to disk after each such call.

Yep, hence that I placed the partial index received from the worker  
in the queen's local disk, left there until such time as all the  
partial indexes had come in, and then do a final UberMerge of all of  
them in one hit.

> As for MapReduce, from what I understand, it's quite a bit more
> complicated under the hood, but very simple on the surface - given a
> single big task, chop it up into a number of smaller ones, put them in
> the massive, parallel system, and re-assemble them when they are done.

Is this sort of like the Fork-Join thing that Doug Lea talks about in  
his concurrency book?  Anyway, the concept you mention is exactly the  
one I'm interested in.  I'll have to hunt through the Nutch stuff to  
see.  I guess it all depends if a problem can be easily and  
programmatically decomposed into smaller units.

> I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce
> code is in Nutch, I haven't been paying close enough attention.

Me too.. :)  I didn't even know Nutch was now fully in the ASF, and  
I'm a Member... :-$


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message