hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <tdunn...@veoh.com>
Subject RE: Compression using Hadoop...
Date Fri, 31 Aug 2007 18:40:22 GMT

My 10x was very rough.

I based it on:

a) you want a few files per map task
b) you want a map task per core

I tend to use quad core machines and so I used 2 x 8 = 10 (roughly).

On EC2, you don't have multi-core machines (I think) so you might be fine with 2-4 files per

-----Original Message-----
From: C G [mailto:parallelguy@yahoo.com]
Sent: Fri 8/31/2007 11:21 AM
To: hadoop-user@lucene.apache.org
Subject: RE: Compression using Hadoop...
> Ted, from what you are saying I should be using at least 80 files given the cluster size,
and I should modify the loader to be aware 
> of the number of nodes and split accordingly. Do you concur?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message