hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arindam Choudhury <arindamchoudhu...@gmail.com>
Subject Re: Running terasort with 1 map task
Date Tue, 26 Feb 2013 11:52:18 GMT
Thanks . As Julien said I want to do a performance measurement.

Actually,
hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map

has generated:
Total size:    3200029737 B
Total dirs:    3
Total files:    5
Total blocks (validated):    27 (avg. block size 118519619 B)

Thats why so many maps.


On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <julien.muller@ezako.com>wrote:

> Maybe your goal is to have a baseline for performance measurement?
> In that case, you might want to consider running only one taskTracker?
>  You would have multiple tasks but running on only 1 machine. Also, you
> could make mappers run serially, by configuring only one map slot on your 1
> node cluster.
>
> Nevertheless I agree with Bertrand, this is not really a realistic use
> case (or maybe you can give us more clues).
>
> Julien
>
>
> 2013/2/26 Bertrand Dechoux <dechouxb@gmail.com>
>
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> It is possible to have a single mapper if the input is not splittable BUT
>> it is rarely seen as a feature.
>> One could ask why you want to use a platform for distributed computing
>> for a job that shouldn't be distributed.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to run terasort using one map and one reduce. so, I
>>> generated the input data using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> Then I launched the hadoop terasort job using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>
>>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>>> the job statistics I found:
>>>
>>> hadoop job -history /user/hadoop/output1
>>>
>>> Task Summary
>>> ============================
>>> Kind    Total    Successful    Failed    Killed    StartTime
>>> FinishTime
>>>
>>> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
>>> 10:57:55 (8sec)
>>> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
>>> 11:05:37 (7mins, 40sec)
>>> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
>>> 11:08:31 (10mins, 10sec)
>>> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
>>> 11:08:36 (4sec)
>>> ============================
>>>
>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>
>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>
>>> Thanks,
>>>
>>
>>
>

Mime
View raw message