hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
Date Tue, 04 Aug 2009 04:41:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738817#action_12738817

Jothi Padmanabhan commented on MAPREDUCE-157:

Here is a proposal for the change to write JobHistory events  in JSON format.

# We will decouple the generation of events with the actual writing/reading of events. JobHistory
module will generate Events and pass them on to Event Writers who would do the actual writing
of events to the underlying stream. Similarly, on the reading front, Event Readers will read
data from the underlying stream and generate events which would then be passed on to the callers
(History Viewers, other external log aggregators)
# In addition, there would be a provision to stream events directly to external listeners
as and when they are generated (See HistoryListener Interface in the code snippet below).

# The Framework's event writer would write the events to a local file in JSON format. We will
use http://jackson.codehaus.org/
# For modularity, we will have abstract classes for HistoryEvents, HistoryEventWriter and
HistoryEventReaders. Events will have a kind and a type. Examples of Kind include Job, Task,
TaskAttempt. Each kind could support multiple type. Example type for Job include Submitted,
Initied, Finished (and others).
# While writing Json data, each record will be a separate line by itself. There will not be
any new lines within a record.
# Each event class would support a toJSON() method that would serialize the event into a JsonNode.
Event writers can use this method to write this event in the JSON format to the underlying
stream. If the event writers want to write to a different format, they could choose either
to parse this JsonNode object or query the Event itself after ascertaining its kind and type.
# Similarly, each Event class would support a constructor that takes a JsonNode Object to
create an event instance by the event readers while reading from the underlying stream.
# Currently, the JobConf object is stored as a separate file, independent of the actual JobHistoryFile.
We could possibly store the conf contents as a part of the history file itself. We could wrap
the conf object as a special event that is logged during the job submission time.

Here are some illustrative code snippets


public abstract class HistoryEvent {

  protected String type;
  protected HistoryEventKind kind;
  public static enum HistoryEventKind {JOB, TASK, TASKATTEMPT, ...}

  public String getEventType( ) { return type; }
  public HistoryEventKind getEventKind() { return kind; }
  public abstract JsonNode toJSON(JsonNodeFactory jnf);
  public HistoryEvent(JsonNode node) { }

  public HistoryEvent() {}

public abstract class JobHistoryEvent extends HistoryEvent {
  public JobHistoryEvent() { kind = HistoryEventKind.JOB; }
  public JobHistoryEvent(JsonNode node) { kind = HistoryEventKind.JOB;}

// An example implementation of the JobSubmittedEvent

public class JobSubmittedEvent extends JobHistoryEvent {

  private JobID jobid;
  private  String jobName;
  private  String userName;
  private  long submitTime;
  private  Path jobConfPath;
  public JobSubmittedEvent(JobID id, String jobName, String userName,
      long submitTime, Path jobConfPath) {
    this.jobid = id;
    this.jobName = jobName;
    this.userName = userName;
    this.submitTime = submitTime;
    this.jobConfPath = jobConfPath;
    type = "SUBMITTED";
  public JobID getJobid() { return jobid; }
  public String getJobName() { return jobName; }
  public String getUserName() { return userName; }
  // other getters
  public JobSubmittedEvent(JsonNode node) {
  // Code to generate event from JsonNode

  public JsonNode toJSON(JsonNodeFactory jnf) {
    ObjectNode node = new ObjectNode(jnf);
    node.put("EVENT_KIND", kind.toString());
    node.put("EVENT_TYPE", type);
    node.put("JOB_ID", jobid.toString());
    node.put("JOB_NAME", jobName);
    node.put("USER_NAME", userName);
    node.put("SUBMIT_TIME", submitTime);
    node.put("JOB_CONF_PATH", jobConfPath.toString());
    return node;


public abstract class HistoryEventWriter {

  public abstract void open(String name);

  public abstract void write(HistoryEvent event) throws IOException;

  public abstract void flush() throws IOException;

  public abstract void close() throws IOException;

public abstract class HistoryEventReader {

  public abstract void open(String name) throws IOException;

  public abstract Iterator<HistoryEvent> iterator();

  public abstract void close() throws IOException;


public interface HistoryListener {
  public handleHistoryEvent(HistoryEvent event);


> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
> Currently, parsing the job history logs with external tools is very difficult because
of the format. The most critical problem is that newlines aren't escaped in the strings. That
makes using tools like grep, sed, and awk very tricky.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message