hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From brisk <mylinq...@gmail.com>
Subject shuffle in Hadoop
Date Fri, 24 Feb 2012 21:51:06 GMT
Hi,

Does anybody have an idea how task tracker serves requests from reduce task
for map output data. I suppose one reduce task will create one connection
to the tasktracker to fetch output from . But what I observed in task
tracker log is as below:

......
2012-01-25 18:53:39,261 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36914, bytes: 53465, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000000_0, duration: 19534060
2012-01-25 18:53:39,337 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36914, bytes: 46938, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000001_0, duration: 17826996
2012-01-25 18:53:39,360 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36914, bytes: 50244, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000002_0, duration: 19311879
2012-01-25 18:53:44,483 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.37:33784, bytes: 51632, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000000_0, duration: 34476770
2012-01-25 18:53:44,530 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.37:33784, bytes: 53155, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000001_0, duration: 26537404
2012-01-25 18:53:44,579 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.37:33784, bytes: 46672, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000002_0, duration: 19361024
2012-01-25 18:53:44,738 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36924, bytes: 48584, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000000_0, duration: 20676388
2012-01-25 18:53:44,786 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36924, bytes: 40218, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000001_0, duration: 19304853
2012-01-25 18:53:44,815 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36924, bytes: 46912, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000002_0, duration: 20477429
......

Here the task tracker is src: 10.10.101.37:50060. For example, for reduce
task on dest: 10.10.101.40 (only one reduce slot on each node), why it
seems it sets up several sequential connections (two connections one by one
in this snippet) to task tracker in order to fetch the output of map task
id cliID: attempt_201201251749_0034_m_000000_0?

these two connections correspond to the following two items in the snippet:

2012-01-25 18:53:39,261 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36914, bytes: 53465, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000000_0, duration: 19534060
......
2012-01-25 18:53:44,738 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.10.101.37:50060,
dest: 10.10.101.40:36924, bytes: 48584, op: MAPRED_SHUFFLE, cliID:
attempt_201201251749_0034_m_000000_0, duration: 20676388

Thanks for any help!

Ethan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message