hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1199) want InputFormat for task logs
Date Wed, 11 Apr 2007 18:39:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488143
] 

Doug Cutting commented on HADOOP-1199:
--------------------------------------

> The number of splits is equal to the number of configured maptasks (Do folks have better
ideas regards how to do the split?

I was thinking that the InputFormat would read a config parameter to get a jobId, then use
JobClient to query the jobtracker and get the URL for the task log of each task in that job,
and package these URLs into the splits.  The 'getLocations()' implementation for these splits
would return the hostname of the URL, so that attempts would be made to run the task on the
host where the log resides.  Does that make sense?

> want InputFormat for task logs
> ------------------------------
>
>                 Key: HADOOP-1199
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1199
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Doug Cutting
>         Attachments: hadoop1199-v2.patch, hadoop1199.patch
>
>
> We should provide an InputFormat implementation that includes all the task logs from
a job. Folks should be able to do something like:
> job = new JobConf();
> job.setInputFormatClass(TaskLogInputFormat.class);
> TaskLogInputFormat.setJobId(jobId);
> ...
> Tasks should ideally be localized to the node that each log is on.
> Examining logs should be as lightweight as possible, to facilitate debugging. It should
not require a copy to HDFS. A faster debug loop is like a faster search engine: it makes people
more productive. The sooner one can find that, e.g., most tasks failed with a NullPointerException
on line 723, the better. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message