hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: Control the number of Mappers
Date Thu, 25 Nov 2010 20:32:53 GMT
More to your need, (I had missed this earlier),
>>The number of cores is not something I know in advance, so writing a special InputFormat
might be tricky, unless I can query Hadoop for the available # of cores

You dont have to write a fancy InputFormat.
Once you have an (correct) implementation of MultiFileInputFormat ,
then from my drive program which launches my map reduce job I would do
something like this :

int numMappers = myMagicalFunctionReturningNumOfCores()
job.setNumMapTasks(numMappers);

-Shrijeet

On Thu, Nov 25, 2010 at 12:23 PM, Shai Erera <serera@gmail.com> wrote:
>
> Thanks, I'll take a look
>
> On Thu, Nov 25, 2010 at 10:20 PM, Shrijeet Paliwal <shrijeet@rocketfuel.com> wrote:
>>
>> Shai,
>> You will have to implement MultiFileInputFormat  and set that has your input format.
>> You may find http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/examples/MultiFileWordCount.html useful.
>>
>> On Thu, Nov 25, 2010 at 12:01 PM, Shai Erera <serera@gmail.com> wrote:
>>>
>>> I wasn't talking about how to configure the cluster to not invoke more than a
certain # of Mappers simultaneously. Instead, I'd like to configure a (certain) job to invoke
exactly N Mappers, where N is the number of cores in the cluster. Irregardless of the size
of the data. This is not critical if it can't be done, but it can improve the performance
of my job if it can be done.
>>>
>>> Thanks
>>> Shai
>>>
>>> On Thu, Nov 25, 2010 at 9:55 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 2010/11/25 Shai Erera <serera@gmail.com>:
>>>> > Is there a way to make MapReduce create exactly N Mappers? More
>>>> > specifically, if say my data can be split to 200 Mappers, and I have
only
>>>> > 100 cores, how can I ensure only 100 Mappers will be created? The number
of
>>>> > cores is not something I know in advance, so writing a special InputFormat
>>>> > might be tricky, unless I can query Hadoop for the available # of cores
(in
>>>> > the entire cluster).
>>>>
>>>> You can configure on a node by node basis how many map and reduce
>>>> tasks can be started by the task tracker on that node.
>>>> This is done via the conf/mapred-site.xml using these two settings:
>>>> mapred.tasktracker.{map|reduce}.tasks.maximum
>>>>
>>>> Have a look at this page for more information
>>>> http://hadoop.apache.org/common/docs/current/cluster_setup.html
>>>>
>>>> --
>>>> Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>
>>
>

Mime
View raw message