hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
Date Thu, 13 Aug 2009 04:45:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742712#action_12742712

Owen O'Malley commented on MAPREDUCE-157:

I'm confused what the goal of using Avro here would be.

Let's review the goals:
  1. Get an easily parseable text format.
  2. Not require excessive amounts of time for logging
     2a. Not require excessive object allocations.

It seems like to use Avro, we'd need to create the Avro objects and then write them out. I'd
rather just use a JsonWriter to write the events out to the stream. Of course reading is the
reverse. I would be like writing xml files by generating the necessary DOM objects. You can
do it (and in fact Configuration is written that way. *sigh*), but it costs a lot of time.

Not having seen the Avro text format, I can't evaluation how much overhead it adds. None of
the features of Avro seem compelling in this case, and could easily lead to unfortunate choices.

Furthermore, I don't know if there are any guarantees about the Avro text format's stability.
We need stability in this format.

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
> Currently, parsing the job history logs with external tools is very difficult because
of the format. The most critical problem is that newlines aren't escaped in the strings. That
makes using tools like grep, sed, and awk very tricky.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message