avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1452) Problem when using AvroMultipleOutputs with multiple schemas
Date Mon, 03 Feb 2014 22:17:06 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889988#comment-13889988

Doug Cutting commented on AVRO-1452:

Can you please provide a complete test that demonstrates this problem?  Thanks!

> Problem when using AvroMultipleOutputs with multiple schemas
> ------------------------------------------------------------
>                 Key: AVRO-1452
>                 URL: https://issues.apache.org/jira/browse/AVRO-1452
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.7.6
>         Environment: Any Platform
>            Reporter: Vladislav Spivak
>              Labels: easyfix
> When using multiple named outputs with different Key/Value Schemas, the last provided
schema overrides any previous schema definitions after first write attempt. This happens due
to issue with the following  code in AvroMultipleOutputs.java:509
> /*begin*/
>     Job job = new Job(context.getConfiguration());
>    ...
>     setSchema(job, keySchema, valSchema);
>     taskContext = createTaskAttemptContext(
>       job.getConfiguration(), context.getTaskAttemptID());
> /*end*/
> Every time this code runs, actual configuration instance passed to createTaskAttemptContext
remains the same, because Job constructor creates new configuration copy only if it is not
instanceof JobConf. This way we have properties  "avro.schema.output.XXX" overwrote each time
new TaskAttemptContext is initialised and also mistakenly shared Configuration instance for
all TaskAttemptContextes
> Proposed fix:
> a) use "Job getInstance(Configuration conf)" or
> b) call "new Job(new Configuration(context.getConfiguration))"

This message was sent by Atlassian JIRA

View raw message