hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary
Date Thu, 08 Oct 2009 18:13:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763602#action_12763602

Hong Tang commented on MAPREDUCE-1016:

bq. I think we're over-engineering this. Can we take this to a separate issue?
Sure. In this jira, comparing with the current code, I am only proposing breaking down the
first line (version string) into two lines and add an empty line after the schema (and add
a few keywords). This could make the format easily extended to a more elaborated schema in
the future without any special code handling backward/forward compatibility.

bq. That would require the parsing of schema string to figure out that the schema matches
across the files before merging, no ? 
Yes/No. The one stored in TFile would be a default one. If a job history has a different schema
than that default, then, for that entry, we can keep the "Schema: " line in the data. 

bq. Can we somehow make it available remotely via url ? 
One of the principal advantage of Avro versus Protocol Buffer is that it is completely self-contained,
This seems a step backwards.

> Make the format of the Job History be JSON instead of Avro binary
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-1016
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Doug Cutting
>             Fix For: 0.21.0, 0.22.0
>         Attachments: MAPREDUCE-1016.patch
> I forgot that one of the features that would be nice is to off load the job history display
from the JobTracker. That will be a lot easier, if the job history is stored in JSON. Therefore,
I think we should change the storage now to prevent incompatibilities later.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message