hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karthikeyan S <karthispe...@gmail.com>
Subject Re: Hadoop shuffling traffic
Date Fri, 26 Sep 2014 05:00:49 GMT
The reducer starts as soon as it has data available from any one of the mappers.
The reducer keeps polling the AM and asks if any mapper has completed
processing. If so it fetches data from that mapper.
So it's not necessary for all the mappers of a task to complete for
the reducer to start processing.

When the reducers starts fetching the data from the mappers it prints
that info in its syslog, from what I have seen.

Thanks,
Karthik

On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <jiangbinglover@gmail.com> wrote:
> see mapreduce.job.reduce.slowstart.completedmaps
> It gives hint of  when reduce tasks could kick off.
>
> 2014-09-26 8:36 GMT+08:00 Abdul Navaz <navaz.enc@gmail.com>:
>>
>> Hello,
>>
>> I am having a Hadoop cluster with 1 name node and 3 data nodes. I running
>> sample word count job on 1GB of file which is distributed among the HDFS.
>>
>> When I run the map reduce job, before even completing the mapping 100 %
>> reduce starts.  Say for eg map 40% reduce 10% etc.
>>
>> I would like to know when the shuffling traffic starts ?
>>
>> ->  Is there any way to find out when exactly shuffling started ?  Does it
>> generate any syslog in the logs .
>> -> How to find the total amount of shuffling traffic?
>>
>>
>>
>> Thanks & Regards,
>>
>> Abdul Navaz
>> Research Assistant
>> University of Houston Main Campus, Houston TX
>> Ph: 281-685-0388
>>
>
>
>
> --
> Bing Jiang
> Tel:(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: www.binospace.com
> BLOG: http://blog.sina.com.cn/jiangbinglover
> Focus on distributed computing, HDFS/HBase

Mime
View raw message