avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Paulsen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1356) AvroMultipleOutputs map only jobs do not use NamedOutput schemas
Date Sun, 21 Jul 2013 14:38:48 GMT
Alan Paulsen created AVRO-1356:
----------------------------------

             Summary: AvroMultipleOutputs map only jobs do not use NamedOutput schemas
                 Key: AVRO-1356
                 URL: https://issues.apache.org/jira/browse/AVRO-1356
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.4
            Reporter: Alan Paulsen
             Fix For: 1.7.5


AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:

{code:java}
boolean isMaponly = job.getNumReduceTasks() == 0;
    if (keySchema != null) {
      if (isMaponly)
        AvroJob.setMapOutputKeySchema(job, keySchema);
      else
        AvroJob.setOutputKeySchema(job, keySchema);
    }
    if (valSchema != null) {
      if (isMaponly)
        AvroJob.setMapOutputValueSchema(job, valSchema);
      else
        AvroJob.setOutputValueSchema(job, valSchema);
    }
{code}

Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is
map only, and uses the OutputKeySchema and OutputValueSchema regardless.

We can fix this by either 
* Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only
and use the appropriate schema.  (Seems right)
* Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message