hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Which strategy is proper to run an this enviroment?
Date Sat, 12 Feb 2011 19:33:55 GMT
This sounds like it will be very inefficient.  There is considerable
overhead in starting Hadoop jobs.  As you describe it, you will be starting
thousands of jobs and paying this penalty many times.

Is there a way that you could process all of the directories in one
map-reduce job?  Can you combine these directories into a single directory
with a few large files?

On Fri, Feb 11, 2011 at 8:07 PM, Jun Young Kim <juneng603@gmail.com> wrote:

> Hi.
> I have small clusters (9 nodes) to run a hadoop here.
> Under this cluster, a hadoop will take thousands of directories sequencely.
> In a each dir, there is two input files to m/r. Size of input files are
> from
> 1m to 5g bytes.
> In a summary, each hadoop job will take an one of these dirs.
> To get best performance, which strategy is proper for us?
> Could u suggest me about it?
> Which configuration is best?
> Ps) physical memory size is 12g of each node.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message