spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25874) Simplify abstractions in the K8S backend
Date Mon, 29 Oct 2018 22:47:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcelo Vanzin updated SPARK-25874:
-----------------------------------
    Issue Type: Umbrella  (was: Improvement)

> Simplify abstractions in the K8S backend
> ----------------------------------------
>
>                 Key: SPARK-25874
>                 URL: https://issues.apache.org/jira/browse/SPARK-25874
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>            Reporter: Marcelo Vanzin
>            Priority: Major
>
> I spent some time recently re-familiarizing myself with the k8s backend, and I think
there is room for improvement. In the past, SPARK-22839 was added which improved things a
lot, but it is still hard to follow the code, and it is still more complicated than it should
be to add a new feature.
> I've worked on the main things that were bothering me and came up with these changes:
> https://github.com/vanzin/spark/commits/k8s-simple
> Now that patch (first commit of the branch) is a little large, which makes it hard to
review and to properly assess what it is doing. So I plan to break it down into a few steps
that I will file as sub-tasks and send for review independently.
> The commit message for that patch has a lot of the background of what I changed and why.
Since I plan to delete that branch after the work is done, I'll paste it here:
> {noformat}
> There are two main changes happening here.
> (1) Simplify the KubernetesConf abstraction.
> The current code around KubernetesConf has a few drawbacks:
> - it uses composition (with a type parameter) for role-specific configuration
> - it breaks encapsulation of the user configuration, held in SparkConf, by
>   requiring that all the k8s-specific info is extracted from SparkConf before
>   the KubernetesConf object is created.
> - the above is usually done by parsing the SparkConf info into k8s-backend
>   types, which are then transformed into k8s requests.
> This ends up requiring a whole lot of code that is just not necessary.
> The type parameters make parameter and class declarations full of needless
> noise; the breakage of encapsulation makes the code that processes SparkConf
> and the code that builds the k8s descriptors live in different places, and
> the intermediate representation isn't adding much value.
> By using inheritance instead of the current model, role-specific
> specialization of certain config properties works simply by implementing some
> abstract methods of the base class (instead of breaking encapsulation), and
> there's no need anymore for parameterized types.
> By moving config processing to the code that actually transforms the config
> into k8s descriptors, a lot of intermediate boilerplate can be removed.
> This leads to...
> (2) Make all feature logic part of the feature step itself.
> Currently there's code in a lot of places to decide whether a feature
> should be enabled. There's code when parsing the configuration, building
> the custom intermediate representation in a way that is later used by
> different code in a builder class, which then decides whether feature A
> or feature B should be used.
> Instead, it's much cleaner to let a feature decide things for itself.
> If the config to enable feature A exists, it proceses the config and
> sets up the necessary k8s descriptors. If it doesn't, the feature is
> a no-op.
> This simplifies the shared code that calls into the existing features
> a lot. And does not make the existing features any more complicated.
> As part of this I merged the different language binding feature steps
> into a single step. They are sort of related, in the sense that if
> one is applied the others shouldn't, and merging them makes the logic
> to implement that cleaner.
> The driver and executor builders are now also much simpler, since they
> have no logic about what steps to apply or not. The tests were removed
> because of that, and some new tests were added to the suites for
> specific features, to verify what the old builder suites were testing.
> On top of the above I made a few minor changes (in comparison):
> - KubernetesVolumeUtils was modified to just throw exceptions. The old
>   code tried to avoid throwing exceptions by collecting results in `Try`
>   objects. That was not achieving anything since all the callers would
>   just call `get` on those objects, and the first one with a failure
>   would just throw the exception. The new code achieves the same
>   behavior and is simpler.
> - A bunch of small things, mainly to bring the code in line with the
>   usual Spark code style. I also removed unnecessary mocking in tests,
>   unused imports, and unused configs and constants.
> - Added some basic tests for KerberosConfDriverFeatureStep.
> Note that there may still be leftover intermediate types that could
> potentially be removed. But at this point the change is already pretty
> large as it is.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message