spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zheh12 <...@git.apache.org>
Subject [GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...
Date Tue, 08 May 2018 03:24:44 GMT
Github user zheh12 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21257#discussion_r186608143
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
    @@ -207,9 +207,25 @@ case class InsertIntoHadoopFsRelationCommand(
         }
         // first clear the path determined by the static partition keys (e.g. /table/foo=1)
         val staticPrefixPath = qualifiedOutputPath.suffix(staticPartitionPrefix)
    -    if (fs.exists(staticPrefixPath) && !committer.deleteWithJob(fs, staticPrefixPath,
true)) {
    -      throw new IOException(s"Unable to clear output " +
    -        s"directory $staticPrefixPath prior to writing to it")
    +
    +    // check if delete the dir or just sub files
    +    if (fs.exists(staticPrefixPath)) {
    +      // check if is he table root, and record the file to delete
    +      if (staticPartitionPrefix.isEmpty) {
    +        val files = fs.listFiles(staticPrefixPath, false)
    +        while (files.hasNext) {
    +          val file = files.next()
    +          if (!committer.deleteWithJob(fs, file.getPath, true)) {
    --- End diff --
    
    We choose to postpone deletion. Whether or not `output` is the same as `input`,
    now the `_temporary` directory is created in the `output` directory before deletion,
    so that it is not possible to delete the root directory directly.
    
    The original implementation was able to delete the root directory directly because it
was deleted before the job was created, and then the root directory was rebuilt. Then the
`_temporary` directory was created. Failure of any `task` in `job` in the original implementation
will result in the loss of `output` data.
    
     I can't figure out how to separate the two situations. Do you have any good ideas?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message