Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com
 designates 209.85.217.215 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=wsOqbpreL8nufHqD/327ALq3ohPZLq17zamMO3Xl+XY8RUyRGBMuj1sEIWsak03W34
         jhcaLTLr429hNXJmoB92eCwPLlrMEJ/QRiUg/Z/nWsHTvgvb3VBY83C9XQMB3iwCr94F
         MZIpHDCtDxQzjmfKnuNXVtabcQAfOST0DOWxU=
MIME-Version: 1.0
In-Reply-To: <p06240800c67bc5f91a48@192.168.1.43>
References: 
 <E3137F97B0A0804194F369F4E7C3B71927B98A160F@EXCHANGE.persistent.co.in>
	<c7d45fc70907090838u22915081w8555183f6793a723@mail.gmail.com>
	<p06240800c67bc5f91a48@192.168.1.43>
From: Ted Dunning <ted.dunning@gmail.com>
Date: Thu, 9 Jul 2009 09:57:42 -0700
Message-ID: <c7d45fc70907090957l48923e3er365bd58c5ef3d62f@mail.gmail.com>
Subject: Re: Lucene index creation using Hadoop
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd6acd8394ea4046e48c203

--000e0cd6acd8394ea4046e48c203
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Exactly as we do.

Also, I find that with a large enough collection to care about speed that we
have many more shards than we have reducers so parallelism in indexing is
nearly perfect.

On Thu, Jul 9, 2009 at 9:13 AM, Ken Krugler <kkrugler_lists@transpac.com>wrote:

>
> We wind up with one index (shard) per reducer, so by controlling the number
> of reducers we can vary the shard count, down to a minimum count == the
> number of slaves in the processing cluster.


-- 
Ted Dunning, CTO
DeepDyve

--000e0cd6acd8394ea4046e48c203--