spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gatorsmile <...@git.apache.org>
Subject [GitHub] spark pull request #18714: [SPARK-20236][SQL] dynamic partition overwrite
Date Wed, 03 Jan 2018 02:07:51 GMT
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18714#discussion_r159352867
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
    @@ -39,8 +39,19 @@ import org.apache.spark.mapred.SparkHadoopMapRedUtil
      *
      * @param jobId the job's or stage's id
      * @param path the job's output path, or null if committer acts as a noop
    + * @param dynamicPartitionOverwrite If true, Spark will overwrite partition directories
at runtime
    + *                                  dynamically, i.e., we first write files under a staging
    + *                                  directory with partition path, e.g.
    + *                                  /path/to/staging/a=1/b=1/xxx.parquet. When committing
the job,
    + *                                  we first clean up the corresponding partition directories
at
    + *                                  destination path, e.g. /path/to/destination/a=1/b=1,
and move
    + *                                  files from staging directory to the corresponding
partition
    + *                                  directories under destination path.
      */
    -class HadoopMapReduceCommitProtocol(jobId: String, path: String)
    +class HadoopMapReduceCommitProtocol(
    +     jobId: String,
    +     path: String,
    +     dynamicPartitionOverwrite: Boolean = false)
    --- End diff --
    
    Indents.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message