avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Nagavaram (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1215) AvroMultipleOutputs not working when specifying baseOutputPath
Date Fri, 07 Dec 2012 17:31:23 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashish Nagavaram updated AVRO-1215:
-----------------------------------

    Status: Patch Available  (was: Open)

Patch with necessary changes and Updates TestAvroMultipleOutputs to test the new changes.
The Write(key,value,baseoutputpath) will use the default job output Schema to write the output.
                
> AvroMultipleOutputs not working when specifying baseOutputPath
> --------------------------------------------------------------
>
>                 Key: AVRO-1215
>                 URL: https://issues.apache.org/jira/browse/AVRO-1215
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.2
>            Reporter: Matthew Hayes
>              Labels: avro, mapreduce
>
> I'm calling the write() method of AvroMultipleOutputs which takes the baseOutputPath.
 The reducer appears to begin hanging once it tries writing to a baseOuputPath value not already
encountered.  It then fails with:
> org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file ... because current leaseholder is trying to recreate file.
> I think the problem has to do with this line in AvroMultipleOutputs:
> {code}
> // get the record writer from context output format
> //FileOutputFormat.setOutputName(taskContext, baseFileName);
> {code}
> This line is not commented out in the similar code from Hadoop.  So I think the baseOutputPath
is ignored.  As a result when each record writer is created it uses the same path, leading
to the exception.
> Uncommenting this line does not work because of visibility of the method.  However what
this method does is set "mapreduce.output.basename".  But setting this doesn't work either.
 
> After digging through Avro code I found that AvroOutputFormatBase is using "avro.mo.config.namedOutput"
to create the path.  If I replace the commented out line with this it seems to work:
> {code}
> taskContext.getConfiguration().set("avro.mo.config.namedOutput", baseFileName);  
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message