spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <mch...@palantir.com>
Subject Re: Spark on Kubernetes Builder Pattern Design Document
Date Mon, 05 Feb 2018 21:56:42 GMT
I think in this case, the original design that was proposed before the document was implemented
on the Spark on K8s fork, that we took some time to build separately before proposing that
the fork be merged into the main line.

 

Specifically, the timeline of events was:

 
We started building Spark on Kubernetes on a fork and was prepared to merge our work directly
into master,
Discussion on https://issues.apache.org/jira/browse/SPARK-18278 led us to move down the path
of working on a fork first. We would harden the fork, have the fork become used more widely
to prove its value and robustness in practice. See https://github.com/apache-spark-on-k8s/spark
On said fork, we made the original design decisions to use a step-based builder pattern for
the driver but not the same design for the executors. This original discussion was made among
the collaborators of the fork, as much of the work on the fork in general was not done on
the mailing list.
We eventually decided to merge the fork into the main line, and got the feedback in the corresponding
PRs.
 

Therefore the question may less so be with this specific design, but whether or not the overarching
approach we took - building Spark on K8s on a fork first before merging into mainline –
was the correct one in the first place. There’s also the issue that the work done on the
fork was isolated from the dev mailing list. Moving forward as we push our work into mainline
Spark, we aim to be transparent with the Spark community via the Spark mailing list and Spark
JIRA tickets. We’re specifically aiming to deprecate the fork and migrate all the work done
on the fork into the main line.

 

-Matt Cheah

 

From: Mark Hamstra <mark@clearstorydata.com>
Date: Monday, February 5, 2018 at 1:44 PM
To: Matt Cheah <mcheah@palantir.com>
Cc: "dev@spark.apache.org" <dev@spark.apache.org>, "ramanathana@google.com" <ramanathana@google.com>,
Ilan Filonenko <if56@cornell.edu>, Erik <eje@redhat.com>, Marcelo Vanzin <vanzin@cloudera.com>
Subject: Re: Spark on Kubernetes Builder Pattern Design Document

 

That's good, but you should probably stop and consider whether the discussions that led up
to this document's creation could have taken place on this dev list -- because if they could
have, then they probably should have as part of the whole spark-on-k8s project becoming part
of mainline spark development, not a separate fork. 

 

On Mon, Feb 5, 2018 at 1:17 PM, Matt Cheah <mcheah@palantir.com> wrote:

Hi everyone,

 

While we were building the Spark on Kubernetes integration, we realized that some of the abstractions
we introduced for building the driver application in spark-submit, and building executor pods
in the scheduler backend, could be improved for better readability and clarity. We received
feedback in this pull request[github.com] in particular. In response to this feedback, we’ve
put together a design document that proposes a possible refactor to address the given feedback.

 

You may comment on the proposed design at this link: https://docs.google.com/document/d/1XPLh3E2JJ7yeJSDLZWXh_lUcjZ1P0dy9QeUEyxIlfak/edit#[docs.google.com]

 

I hope that we can have a productive discussion and continue improving the Kubernetes integration
further.

 

Thanks,

 

-Matt Cheah

 


Mime
View raw message