spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mridulm <...@git.apache.org>
Subject [GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...
Date Fri, 21 Feb 2014 06:50:21 GMT
Github user mridulm commented on the pull request:

    https://github.com/apache/incubator-spark/pull/626#issuecomment-35703455
  
    Typically, the way this gets done is - write to a temporary directory, taking care of
multiple attempts for same partition (failure case)/multiple concurrent executions on same
partition (speculative execution case) and once job is done,  move to the desired destination
(or delete dir if job fails) - like what mapred does for example.
    (Moves are atomic NN operations).
    
    So when output directory is "done", it is fully done : not partially/in progress/etc.
    Particularly the bug mentioned - of left over files from previous jobs, etc - is just
scarey !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

Mime
View raw message