hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Bogomolov <igor.bogomo...@gmail.com>
Subject tracking remote reads in datanode logs
Date Mon, 23 Feb 2015 18:30:40 GMT
Hi all,

In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to
know how many remote map tasks (ones that read input data from remote
nodes) there are in a mapreduce job. For this purpose I took logs of each
datanode an looked for lines with "op: HDFS_READ" and cliID field that
contains map task id.

Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
Another 1 has many lines with "op: HDFS_READ" but all cliID look like
DFSClient_NONMAPREDUCE_* and does not contain any map task id.

I concluded there are no remote map tasks but that does not look correct.
Also even local reads are not logged (because there is no line where cliID
field contains some map task id). Could anyone please explain what's wrong?
Why logging is not working? (I use default settings).

Chris,

Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062> that
you have implemented. Thought you might have an explanation.

Best,
Igor

Mime
View raw message