hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Identify splits processed by each mapper
Date Mon, 16 Jan 2012 17:12:23 GMT

On 16-Jan-2012, at 6:46 PM, Bejoy Ks wrote:
>       A quick question. I have quite a few map reduce jobs running on my cluster. One
job's input itself has a large number of files, I'd like to know which split was processed
by each map task without doing any custom logging (for successful, falied & killed tasks)
. I tried digging into the job tracker web UI but I just got a pointer as input split location
which specifies the nodes in which it is located, but what I'm looking for is the file name
 and which split of that file.

Initially the status (via reporter) of a task is set to the FileSplit's path plus offset and
length, but that's all.

> Where can I find this information ? 

Unfortunately, none of this is logged by default. Please file a JIRA to have it added/discuss
how to add this (do follow up this thread with the ID)

> Is it available or can I make it available in in jobdetails.jsp? 

No, but you can write a short utility program that emulates the splitter and prints the mapping
with that.

> Do I need to enable some configuration parameter to display the same?

No, as far as I know there is none.

> Is it possible only by custom logging and don't hadoop framework provide the same?

Framework does not provide this, so custom logging is the easiest way if it is possible.

Harsh J
Customer Ops. Engineer, Cloudera

View raw message