hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary
Date Wed, 07 Oct 2009 18:03:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763160#action_12763160

Doug Cutting commented on MAPREDUCE-1016:

> Are you saying that maybe we will never need to have a "avro-json-2.0"?

I think it unlikely but not impossible.  If Avro incompatibly changed it's schema format then
we'd probably change it to "Avro2-json" or somesuch.  We'd need to change the version string
if we changed the schema in some critical way that's impossible to detect by examining the
schema itself, but I have a hard time imagining a change like that, and we could always instead,
e.g., change the name or package of the top-level class in the schema to make such a change
easy to detect.  So it's unlikely but not impossible that we'll need to change the version
string, and, if we do, we can append a version number to it.  The version string already future-proofs
us sufficiently.

> Also, what is your opinion on whether job-history format would be fixed on a specific
encoder permanently?

Nothing's permanent.  It seems possible we someday might want to permit binary too, so I added
the encoding to the version string.  But for now I think the proposal is to be json-only.

What seem more probable to me is that we might someday use a standard container format like
TFile or Avro's data file.  These can be differentiated by use of "magic": Avro data file
always begins with 'AVR\0', the json file always begins with "Avro-Json\n", etc.  So it's
good that the file begins with a fixed string rather than just the schema, but exactly what
string it begins with is less critical.

> Make the format of the Job History be JSON instead of Avro binary
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-1016
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Doug Cutting
>             Fix For: 0.21.0, 0.22.0
>         Attachments: MAPREDUCE-1016.patch
> I forgot that one of the features that would be nice is to off load the job history display
from the JobTracker. That will be a lot easier, if the job history is stored in JSON. Therefore,
I think we should change the storage now to prevent incompatibilities later.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message