spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tejasapatil <...@git.apache.org>
Subject [GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Date Mon, 04 Sep 2017 19:17:29 GMT
Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/18975
  
    @gatorsmile : Yes. Hive is not 100% atomic as stuff can go wrong between removing old
data and renaming staging location. But its superior in these regards:
    
    - Hive would output "no data" OR "complete data". Here we can have "no data" OR "incomplete
data" OR "complete data". The "incomplete data" part worries me. Staging dir helps achieving
"you either see nothing OR everything" behaviour.
    - The window of "you see nothing" is much bigger here compared to Hive as the output location
is cleaned up before execution.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message