avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration
Date Wed, 12 Oct 2011 16:45:12 GMT

    [ https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125964#comment-13125964
] 

Doug Cutting commented on AVRO-923:
-----------------------------------

> it seems to me this risk is already taken for other parameters such as "avro.mapper".
For the case of schemas though there is a second check that occurs when the input file schema
does not match the compiled schema.

The input schema is not what I was most concerned about, rather the map output schema.  If
different tasks somehow got a different map output schema it would result in strange hard-to-debug
i/o exceptions.  We require that the map output schema is constant across all tasks in a job
for things to work correctly.  Of course it's not always possible to prohibit folks from creating
erroneous situations, we should try to discourage that but don't want to overly limit functionality
in the process.

> It can also be described with xml files

What I meant was that the xml files can be programmatically constructed.  They should ideally
not be constructed with cut and paste, but should use the same source for schemas as the Java
code that's getting re-generated to build the new version of the jar file.  Perhaps you can
refer to the schemas with an external entity definition in the XML that fetches the appropriate
version? 

{code}
<!DOCTYPE job [
<!ENTITY schemaX SYSTEM "http://svn.foo.com/project/trunk/schemas/x.avsc">
]>
<job>
 ... &schemaX; ...
</job>
{code}

                
> Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-923
>                 URL: https://issues.apache.org/jira/browse/AVRO-923
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.4
>         Environment: any
>            Reporter: Julien Muller
>             Fix For: 1.6.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The current implementation of Avro MapRed is designed to use JobConf. While it is possible
to use job.xml file, it is pretty painful since you have to copy/paste the all schemes for
input and output. This is error prone and time consuming. Also any update in a bean requires
to recopy/repaste the schema (if using JobConf a simple recompile would be enough).
> A proposition to improve this and to stay backward compatible would be to introduce new
keys in AvroJob and reference the actual avro bean used. This can be implemented as a fallback.
> New keys would be created:
> - avro.input.schema > avro.input.class
> - avro.map.output.schema > avro.map.output.class
> - avro.output.schema > avro.output.class
> Only 3 methods would be impacted in AvroJob:
> - getInputSchema(Configuration job) {
> 	// Implement a fallback like
> 	String s = job.get(INPUT_SCHEMA);
> 	if(s==null) s = (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField("SCHEMA$").get(null);
> 	    return Schema.parse(s);
> 	}
>   }
> - getMapOutputSchema()
> - getOutputSchema()
> Also, it would be more consistent to add new setters. This is not mandatory since in
that use case, the new keys are filled up directly in the job, not using AvroJob. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message