spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vanzin <...@git.apache.org>
Subject [GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Date Thu, 13 Oct 2016 18:33:35 GMT
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15009#discussion_r83280861
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1189,6 +1217,10 @@ private[spark] class Client(
     private object Client extends Logging {
     
       def main(argStrings: Array[String]) {
    +    mainWithEnv(argStrings, Map() ++ sys.env)
    +  }
    +
    +  def mainWithEnv(argStrings: Array[String], env: Map[String, String]): Unit = {
    --- End diff --
    
    There's something that bothers me about this "env" argument. What is it supposed to be?
It sounds too much like it should be a custom `sys.env` but the code that calls this tells
me otherwise. Reading the rest of the code it seems you're using it both for `SparkConf` entries
and as an override for `sys.env`; I think it would be better to have separate arguments.
    
    Also, I think it would be better to create an explicit interface for this. Like a trait
that defines a `sparkMain` method that takes app args and Spark-specific args. Something like:
    
    ```
    trait SparkApp {
      this: Singleton =>
    
      def sparkMain(args: Array[String], conf: Map[String, String]): Int
    ```
    
    Thinking about the future, I think it would even be good to think about making that trait
a public interface, and have `conf` be a `SparkConf` (although there are a few complications
w.r.t. logging before that can happen). That would solve at least a couple of problems:
    
    - have an explicit interface for Spark apps instead of overloading Java's main()
    - have an explicit exit code for Spark apps (see how messy that is with yarn-cluster mode
currently)
    
    For your particular change, that trait can remain `private[spark]`, and if the class being
run does not implement it, you can throw an exception if launching an app in-process, kinda
like your current code. But I think having an explicit interface for this would make both
your approach easier to understand, and easier to extend it to other cluster managers / apps
in the future.
    
    If adding the trait, you can also make the code in `SparkSubmit` simpler by having an
implementation of the trait that wraps a regular app that just has a `main(String[])` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message