hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sharad Agarwal (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
Date Thu, 10 Sep 2009 10:10:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753542#action_12753542
] 

Sharad Agarwal commented on MAPREDUCE-157:
------------------------------------------

Some comments:
 - Responsibilities of JobHistory and JobHistoryWriter are not clear. I think these should
be merged in single class and name it say JobHistoryManager
 - Base interface HistoryEvent should be public
 - Getters in event classes should be public
 - HISTORY_VERSION should be 1.0 instead of '0.21'
 - JobHistory#initDone is swallowing the exception
 - Code for localizing of job conf should be moved out of history
 - JobHistory has two logEvents methods. That it confusing. Better would be to just have one
logEvent method, and no state check in logEvents. JobInProgress will explicitly create/close
job history writers
 - HistoryCleaner thread should have a check as lastRan != 0 and not lastRan == 0. Also it
should use doneDirFs
 - Using: 
    if (jobName == null || jobName.length() == 0) {
      jobName = "NA";
    }
is not a nice thing. This was perhaps done for some reason in the old format but in new one
not required. we can have empty string in json
 - file name encoding not required. JobHistoryUtil class can go away
 - TaskInProgress and JobInProgress do a jobHistory != null check everywhere. This is not
elegant.
 - test case is missed from TestJobHistory#testDoneFolderOnHDFS 
 - EventReader -> better msg in readOneGroup - > throw new IOException("Internal error");
 - EventReader -> getHistoryEventType need not be static. It can use the instance parser
handle
 - History Viewer command line has changed. It takes file name instead of output folder. JobClient#displayUsage
and commands manual documentation should be corrected. 
 - All public APIs including constructors must be documentated






> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>    Affects Versions: 0.20.1
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.21.0
>
>         Attachments: mapred-157-10Sep.patch, mapred-157-4Sep.patch, mapred-157-7Sep-v1.patch,
mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch
>
>
> Currently, parsing the job history logs with external tools is very difficult because
of the format. The most critical problem is that newlines aren't escaped in the strings. That
makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message