hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Identify splits processed by each mapper
Date Mon, 16 Jan 2012 17:39:40 GMT
Thanks Harsh

I wanted to utilize if this feature was already available in map reduce
before going in for a custom logging. I may have to go with custom logging
for the moment.

I have filed a JIRA for the same. Please review and update if it require
more details.


On Mon, Jan 16, 2012 at 10:42 PM, Harsh J <harsh@cloudera.com> wrote:

> Bejoy,
> On 16-Jan-2012, at 6:46 PM, Bejoy Ks wrote:
>       A quick question. I have quite a few map reduce jobs running on my
> cluster. One job's input itself has a large number of files, I'd like to
> know which split was processed by each map task without doing any custom
> logging (for successful, falied & killed tasks) . I tried digging into the
> job tracker web UI but I just got a pointer as input split location which
> specifies the nodes in which it is located, but what I'm looking for is the
> file name  and which split of that file.
> Initially the status (via reporter) of a task is set to the FileSplit's
> path plus offset and length, but that's all.
> Where can I find this information ?
> Unfortunately, none of this is logged by default. Please file a JIRA to
> have it added/discuss how to add this (do follow up this thread with the ID)
> Is it available or can I make it available in in jobdetails.jsp?
> No, but you can write a short utility program that emulates the splitter
> and prints the mapping with that.
> Do I need to enable some configuration parameter to display the same?
> No, as far as I know there is none.
> Is it possible only by custom logging and don't hadoop framework provide
> the same?
> Framework does not provide this, so custom logging is the easiest way if
> it is possible.
> --
> Harsh J
> Customer Ops. Engineer, Cloudera

View raw message