hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
Date Tue, 18 Aug 2009 08:31:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744415#action_12744415

Jothi Padmanabhan commented on MAPREDUCE-157:

Regarding the interface for readers, we could support two kinds of users:

# Users who want fine grained control and would handle the individual events themselves. 
# Users who want a much more granular, summary kind of information. 

For users of type 1, who want finer grained information, they could use Event Readers to iterate
through events and do the necessary processing

For users of type 2, we could provide more granular information through a JobHistoryParser
class. This class would internally build the Job-Task-Attempt hierarchy/information by consuming
all events using a event reader and make the summary information available for users to access.
Users could do some thing like


parser.init(history file or stream)

JobInfo jobInfo = parser.getJobInfo();

// use the getters to get jobinfo (example: start time, finish time, counters, id, user name,
conf, total maps, total reds, among others)

List<TaskInfo> taskInfoList = jobInfo.getAllTasks();

// Iterate through the list and do necessary processing. Getters for taskinfo would include
taskid, task type, status, splits, counters, etc

List<TaskAttemptInfo> attemptsList = taskinfo.getAllAttempts();

// Attempt info would have getters for attempt id, errors, status, state, start time, finish
time, tracker name, port etc.



> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
> Currently, parsing the job history logs with external tools is very difficult because
of the format. The most critical problem is that newlines aren't escaped in the strings. That
makes using tools like grep, sed, and awk very tricky.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message