hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arindam Choudhury <arindamchoudhu...@gmail.com>
Subject Re: Running terasort with 1 map task
Date Tue, 26 Feb 2013 12:34:24 GMT
sorry my bad, it solved


On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:

> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
> size
>
> <property>
>   <name>dfs.block.size</name>
>   <value>134217728</value>
>   <final>true</final>
> </property>
>
> While running the teragen I am again specifying it to be sure:
>
> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
> 320000 /user/hadoop/input
>
> but it generates 3 blocks:
>
> hadoop fsck -blocks -files -locations /user/hadoop/input
> Status: HEALTHY
>  Total size:    32029543 B
>  Total dirs:    3
>  Total files:    4
>  Total blocks (validated):    3 (avg. block size 10676514 B)
>  Minimally replicated blocks:    3 (100.0 %)
>
> What I am doing wrong? How can I generate only one block?
>
>
>
> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Thanks . As Julien said I want to do a performance measurement.
>>
>> Actually,
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> has generated:
>> Total size:    3200029737 B
>> Total dirs:    3
>> Total files:    5
>> Total blocks (validated):    27 (avg. block size 118519619 B)
>>
>> Thats why so many maps.
>>
>>
>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <julien.muller@ezako.com>wrote:
>>
>>> Maybe your goal is to have a baseline for performance measurement?
>>> In that case, you might want to consider running only one taskTracker?
>>>  You would have multiple tasks but running on only 1 machine. Also, you
>>> could make mappers run serially, by configuring only one map slot on your 1
>>> node cluster.
>>>
>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>> case (or maybe you can give us more clues).
>>>
>>> Julien
>>>
>>>
>>> 2013/2/26 Bertrand Dechoux <dechouxb@gmail.com>
>>>
>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>
>>>> It is possible to have a single mapper if the input is not splittable
>>>> BUT it is rarely seen as a feature.
>>>> One could ask why you want to use a platform for distributed computing
>>>> for a job that shouldn't be distributed.
>>>>
>>>> Regards
>>>>
>>>> Bertrand
>>>>
>>>>
>>>>
>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>> arindamchoudhury0@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>> generated the input data using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>
>>>>> Then I launched the hadoop terasort job using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>
>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>> inspect the job statistics I found:
>>>>>
>>>>> hadoop job -history /user/hadoop/output1
>>>>>
>>>>> Task Summary
>>>>> ============================
>>>>> Kind    Total    Successful    Failed    Killed    StartTime
>>>>> FinishTime
>>>>>
>>>>> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
>>>>> 10:57:55 (8sec)
>>>>> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
>>>>> 11:05:37 (7mins, 40sec)
>>>>> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
>>>>> 11:08:31 (10mins, 10sec)
>>>>> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
>>>>> 11:08:36 (4sec)
>>>>> ============================
>>>>>
>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>
>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message