hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary
Date Thu, 08 Oct 2009 18:39:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763617#action_12763617

Doug Cutting commented on MAPREDUCE-1016:

> In this jira, comparing with the current code, I am only proposing breaking down the
first line (version string) into two lines and add an empty line after the schema (and add
a few keywords).

I think that is overkill.  The point of the change to the log format was to make it trivial
for external tools to process these logs.  We don't want to make folks have to write a header
processor.  Most folks will probably just skip the first two lines and use a json parser.
 If they want to do some error-checking, they can check that the first line is "Avro-Json".
 Further, if they want, they can read the schema and use Avro.  If we permit arbitrary headers
then folks have to skip up until a blank line, which harder, although not much.  But if they
want to check the format or use Avro then they have to write a header parser, and logic to
process the header, which I think is an unneeded imposition.

The magic signature for this format is "Avro-Json\n".  The format is:
 - first line is magic
 - second line is an Avro schema
 - subsequent lines are json-encoded instances of that schema

I see no need for a more complex format at this time.  The only change from the original Json-based
format when Avro was added was changing the magic line to not have a version number and adding
a line with the schema, which is, in essence, the version number.  If you think we should
have a more general, header-based format, please file a separate issue.

> Make the format of the Job History be JSON instead of Avro binary
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-1016
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Doug Cutting
>             Fix For: 0.21.0, 0.22.0
>         Attachments: MAPREDUCE-1016.patch
> I forgot that one of the features that would be nice is to off load the job history display
from the JobTracker. That will be a lot easier, if the job history is stored in JSON. Therefore,
I think we should change the storage now to prevent incompatibilities later.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message