hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base
Date Wed, 17 Oct 2007 08:05:24 GMT
Lance,

On Tue, Oct 16, 2007 at 11:27:54PM -0700, Lance Amundsen wrote:
>
>I am struggling to control the behavior of the framework.  The first
>problem is simple: I want to run many simultaneous mapper tasks on each
>node.  I've scoured the forums, done the obvious, and I still typically get
>only 2 tasks per node at execution time.  If it is a big job, sometimes I
>see 3.  Note that the administrator reports 40 Tasks/Node in the config,
>but the most I've ever seen running is 3 (and this with a single input file
>of 10,000 records, magically yielding 443 maps).
>
>And magically is the next issue.  I want to fine tune control the
>InputFile, Input # records, to maps relationship.  For my immediate
>problem, I want to use a single input file with a number of records
>yielding the exact same number of maps (all kicked off simultaneously BTW).
>Since I did not get this behavior with the standard InputFileFormat, I
>created my own input format class and record reader, and am now getting the
>"1 file with n recs to nmaps" relationship.... but the problem is that I am
>not even sure why....
>

I'm in the process of documenting these better (http://issues.apache.org/jira/browse/HADOOP-2046),
meanwhile here are some pointers:
http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces
and
http://wiki.apache.org/lucene-hadoop/FAQ#10

Hope this helps...

Arun

>Any guidance appreciated.
>
>

Mime
View raw message