avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration
Date Tue, 11 Oct 2011 17:35:12 GMT

    [ https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125213#comment-13125213
] 

Doug Cutting commented on AVRO-923:
-----------------------------------

It's slightly riskier to get the schema from the runtime than from the job, in particular
the map output schema.  If different versions of code are somehow run on different nodes,
then different map output schemas could be used, which would create havoc, since the schema
does not travel with the map output data.  When the schema is in the job.xml, there's very
little chance of a lack of coordination, since the framework distributes the same job.xml
to every task.  If the schema comes from the runtime, there's some chance that different versions
of classes could be installed on different nodes.

Another concern is that not all schemas have a class that defines them.  For example, one
might have jobs whose inputs or outputs are "bytes" or "string" or Pair<"string","bytes">,
etc.

These are the reasons that schema-in-job.xml is the required and preferred means of specification.
 However there may be cases where it's preferable to additionally support specification of
schemas via a specific class, as suggested in this issue.

A JobConf can be programmatically constructed.  Why is it so painful to insert the schema
there as a part of your job creation/submission pipeline?  I'd like to better understand why
that's so difficult before we add a new mechanism, since any added mechanism has the potential
to create bugs and user confusion.
                
> Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-923
>                 URL: https://issues.apache.org/jira/browse/AVRO-923
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.4
>         Environment: any
>            Reporter: Julien Muller
>             Fix For: 1.6.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The current implementation of Avro MapRed is designed to use JobConf. While it is possible
to use job.xml file, it is pretty painful since you have to copy/paste the all schemes for
input and output. This is error prone and time consuming. Also any update in a bean requires
to recopy/repaste the schema (if using JobConf a simple recompile would be enough).
> A proposition to improve this and to stay backward compatible would be to introduce new
keys in AvroJob and reference the actual avro bean used. This can be implemented as a fallback.
> New keys would be created:
> - avro.input.schema > avro.input.class
> - avro.map.output.schema > avro.map.output.class
> - avro.output.schema > avro.output.class
> Only 3 methods would be impacted in AvroJob:
> - getInputSchema(Configuration job) {
> 	// Implement a fallback like
> 	String s = job.get(INPUT_SCHEMA);
> 	if(s==null) s = (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField("SCHEMA$").get(null);
> 	    return Schema.parse(s);
> 	}
>   }
> - getMapOutputSchema()
> - getOutputSchema()
> Also, it would be more consistent to add new setters. This is not mandatory since in
that use case, the new keys are filled up directly in the job, not using AvroJob. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message