spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From foxish <...@git.apache.org>
Subject [GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes
Date Thu, 19 Jul 2018 13:22:39 GMT
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/21067
  
    > ReadWriteOnce storage can only be attached to one node.
    
    This is well known. Using the RWO volume for fencing here would work - but this is not
representative of all users. This breaks down if you include checkpointing to object storage
(s3) or HDFS or into ReadWriteMany volumes like NFS. In all of those cases, there will be
a problem with correctness. 
    
    For folks that need it right away, the same restarts feature can be realized using an
approach like the [spark-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
without any of this hassle in a safe way, so, why are we trying to fit this into Spark with
caveats around how volumes should be used to ensure fencing? This seems more error prone and
harder to explain and I can't see the gain from it. One way forward is proposing to the k8s
community to have a new option jobs that allow us to get fencing from the k8s apiserver through
deterministic names. I think that would be a good way forward. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message