hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: Reduce phase of wordcount
Date Sun, 05 Oct 2014 13:52:24 GMT
Don't be confused by 6.03 MB/s.
The relationship between mapper and reducer is M to N relationship, which means the mapper
could send its data to all reducers, and one reducer could receive its input from all mappers.
There could be a lot of reasons why you think the reduce copying phase is too slow. It could
be the mappers are still running, there is no data generated for reducer to copy yet; or there
is no enough threads in either mapper or reducer to utilize remaining cpu/memory/network bandwidth.
You can google the hadoop configurations to adjust them.
But just because you can get 60M/s in scp, then complain only getting 6M/s in the log is not
fair to hadoop. You one reducer needs to copy data from all the mappers, concurrently, makes
it impossible to reach the same speed as one to one point network transfer speed.
The reducer stage is normally longer than map stage, as data HAS to be transferred through
network.
But in word count example, the data needs to be transferred should be very small. You can
ask the following question by yourself:
1) Should I use combiner in this case? (Yes, for word count, it reduces the data needs to
be transferred).2) Do I use all the reducers I can use, if my cluster is under utilized and
I want my job to finish fast?3) Can I add more threads in the task tracker to help? You need
to dig into your log to find out if your mapper or reducer are waiting for the thread from
thread pool.
Yong

Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount sample on an 8 nodes
cluster (master + 7 slaves). Tuning the tasks configuration I´ve been able to make the map
phase run on 22 minutes.. However the reduce phase (which consists of a single job) stucks
at some points making the whole job take more than 40 minutes. Looking at the logs, I´ve
seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0
0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0
0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0
0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0
0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me think it´s having
difficulty transferring the parts from the map nodes. Is my interpretation correct on this
? The trasnfer rate is waaay too slow if compared to scp file transfer between the hosts (10
times slower). Any takes on why ?

Regards,

Renato Moutinho
 		 	   		  
Mime
View raw message