hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary
Date Thu, 08 Oct 2009 08:55:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763420#action_12763420
] 

Hong Tang commented on MAPREDUCE-1016:
--------------------------------------

I'd like to propose the following that may make it easier toward future extension without
requiring too much special case code to support backward compatibility.

Line 1: The version string. We, for now, would require an exact match. And it should be "Avro".
In the future, it could be "Avro/2.0".
Line 2: "Encoder: Json" (Or "Encoder: Binary")
Line 3: "Schema: <schema string>"
Line 4: Empty line
Remaining file: encoded history-event objects.

Essentially, the structure of this file mimic the HTTP style:

Line 1: Protocol + Version. For JobHistoryParser to determine whether it can handle the input,
and if so, what backend deserialization framework to invoke.
Line 2 until the empty line: Header. These are parameters that are needed to instantiate the
deserializer.
Remainder of the file: serialized data.

Does that make sense?

> Make the format of the Job History be JSON instead of Avro binary
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1016
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Doug Cutting
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: MAPREDUCE-1016.patch
>
>
> I forgot that one of the features that would be nice is to off load the job history display
from the JobTracker. That will be a lot easier, if the job history is stored in JSON. Therefore,
I think we should change the storage now to prevent incompatibilities later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message