hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
Date Mon, 10 Aug 2009 08:08:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741221#action_12741221
] 

Jothi Padmanabhan commented on MAPREDUCE-157:
---------------------------------------------

Based on an offline discussion with Owen, Sharad and Devaraj, it does not appear that we have
really strong use cases to support multiple formats for the JobHistory file. As a result,
we will strongly tie the format to JSON and will focus on reducing the number of object created,
writing information directly to the underlying stream where ever possible.  While we will
retain the event framework, we will simplify the interface as compared to the previous design.


One change is to write the event type preceding the actual event object so that the event
readers can read the event type and then decide to create the correct event class based on
the object. We however, will still have only one record per line. A line in the history file
will now look like this:

{noformat}
{"EVENT_TYPE":"JOB_SUBMITTED"} {"EVENT_KIND":"JOB","JOB_ID":"job_test_0000","JOB_NAME":"TEST-JOB-SUBMITTED","USER_NAME":"Jothi","SUBMIT_TIME":1249887005100,"JOB_CONF_PATH":"/tmp"}
{noformat}

Events will now implement writeFields(JsonGenerator) and readFields(JsonParser) methods.

The JobHistory module would create one event writer per jobId; event writers would translate
this into one history file. The event writer will also internally create a JsonGenerator based
on this file and would use this for writing the actual event (by calling event.writeFields).

Similarly, the job history reading module would create one event reader per jobid/file. This
would internally create one JsonParser that would be passed to the individual events' readFields
method.

{code}

interface HistoryEvent {
  void writeFields (JsonGenerator gen) throws IOException;
  void readFields(JsonParser parser) throws IOException;
}

class JobHistory {
...
   // Generate a history file based on jobId, then create a new EventWriter
    JsonEventWriter eventWriter = new JsonEventWriter(conf, historyFile);
    eventWriter.write(jobSubmittedevent);
   eventWriter.write(jobFinishedEvent);
  ....
  eventWriter.close();

}

class SomeHistoryEventUser {
    JsonEventReader eventReader = new JsonEventReader(conf, historyFile);
    while ((ev = eventReader.getNextEvent()) != null) {
      //process ev
    }
   eventReader.close();
}

{code}








> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>
> Currently, parsing the job history logs with external tools is very difficult because
of the format. The most critical problem is that newlines aren't escaped in the strings. That
makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message