hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Lucene index creation using Hadoop
Date Thu, 09 Jul 2009 16:57:42 GMT
Exactly as we do.

Also, I find that with a large enough collection to care about speed that we
have many more shards than we have reducers so parallelism in indexing is
nearly perfect.

On Thu, Jul 9, 2009 at 9:13 AM, Ken Krugler <kkrugler_lists@transpac.com>wrote:

> We wind up with one index (shard) per reducer, so by controlling the number
> of reducers we can vary the shard count, down to a minimum count == the
> number of slaves in the processing cluster.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message