Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C7EBFE.A898EC93"
Subject: RE: Compression using Hadoop...
Date: Fri, 31 Aug 2007 11:40:22 -0700
Message-ID: 
 <435DF58A933BA74397B42CDEB8145A86091DB315@ex9.hostedexchange.local>
Thread-Topic: Compression using Hadoop...
Thread-Index: Acfr+9H0Ph7jl3FMR2qajODeSSd/UQAApGqI
References: <118046.65041.qm@web45404.mail.sp1.yahoo.com>
From: "Ted Dunning" <tdunning@veoh.com>
To: <hadoop-user@lucene.apache.org>,
	<hadoop-user@lucene.apache.org>

------_=_NextPart_001_01C7EBFE.A898EC93
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


My 10x was very rough.

I based it on:

a) you want a few files per map task
b) you want a map task per core

I tend to use quad core machines and so I used 2 x 8 =3D 10 (roughly).

On EC2, you don't have multi-core machines (I think) so you might be =
fine with 2-4 files per CPU.


-----Original Message-----
From: C G [mailto:parallelguy@yahoo.com]
Sent: Fri 8/31/2007 11:21 AM
To: hadoop-user@lucene.apache.org
Subject: RE: Compression using Hadoop...
=20
> Ted, from what you are saying I should be using at least 80 files =
given the cluster size, and I should modify the loader to be aware=20
> of the number of nodes and split accordingly. Do you concur?

------_=_NextPart_001_01C7EBFE.A898EC93--