spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Halliday <pjh...@cornell.edu>
Subject Re: Get rid of FileAlreadyExistsError
Date Tue, 01 Mar 2016 16:12:28 GMT
http://pastebin.com/vbbFzyzb

The problem seems to be to be two fold.  First, the ParquetFileWriter in Hadoop allows for
an overwrite flag that Spark doesn’t allow to be set.  The second is that the DirectParquetOutputCommitter
has an abortTask that’s empty.  I see SPARK-8413 open on this too, but no plans on changing
this.  I’m surprised not to see this fixed yet.

Peter Halliday 



> On Mar 1, 2016, at 10:01 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> 
> Do you mind pastebin'ning the stack trace with the error so that we know which part of
the code is under discussion ?
> 
> Thanks
> 
> On Tue, Mar 1, 2016 at 7:48 AM, Peter Halliday <pjh239@cornell.edu <mailto:pjh239@cornell.edu>>
wrote:
> I have a Spark application that has a Task seem to fail, but it actually did write out
some of the files that were assigned it.  And Spark assigns another executor that task, and
it gets a FileAlreadyExistsException.  The Hadoop code seems to allow for files to be overwritten,
but I see the 1.5.1 version of this code doesn’t allow for this to be passed in.  Is that
correct?
> 
> Peter Halliday
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>
> 
> 


Mime
View raw message