spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shane Huang <shannie.hu...@gmail.com>
Subject Re: Propose to Re-organize the scripts and configurations
Date Sun, 22 Sep 2013 04:13:03 GMT
Done


On Sun, Sep 22, 2013 at 12:05 PM, Reynold Xin <rxin@cs.berkeley.edu> wrote:

> Thanks, Shane. Can you also link to this mailing list discussion from the
> JIRA ticket?
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Sat, Sep 21, 2013 at 9:01 PM, Shane Huang <shannie.huang@gmail.com
> >wrote:
>
> > I summarized the opinions about Config in this post and added a comment
> on
> > SPARK-544.
> > Also post here below:
> >
> > 1) Define a Configuration class which contains all the options available
> > for Spark application. A Configuration instance can be de-/serialized
> > from/to a formatted file. Most of us tend to agree that Typesafe Config
> > library is a good choice for the Configuration class.
> > 2) Each application (SparkContext) has one Configuration instance and it
> is
> > initialized by the application which creates it (either coded in app
> (apps
> > could explicitly read from io stream or command line arguments), or
> system
> > properties, or env vars).
> > 3) For an application the overriding rule should be code > system
> > properties > env vars. Over time we will deprecate the env vars and maybe
> > even system properties.
> > 4) When launching an Executor on a slave node, the Configuration is
> firstly
> > initialized using the node-local configuration file as default (instead
> of
> > the env vars at present), and then the Configuration passed from
> > application driver context will override specific options specified in
> > default. Certain options in app's Configuration will always override
> those
> > in node-local, because these options need to be the consistent across all
> > the slave nodes, e.g. spark.serializer. In this case if any such options
> is
> > not set in app's Config, a value will be provided by the system. On the
> > other hand, some options in app's Config will never override those in
> > node-local. as they're not meat to be set in app, e.g. spark.local.dir
> >
> >
> > On Wed, Sep 18, 2013 at 1:42 AM, Matei Zaharia <matei.zaharia@gmail.com
> > >wrote:
> >
> > > Hi Shane,
> > >
> > > I agree with all these points. Improving the configuration system is
> one
> > > of the main things I'd like to have in the next release.
> > >
> > > > 1) Usually the application developers/users and platform
> administrators
> > > > belongs to two teams. So it's better to separate the scripts used by
> > > > administrators and application users, e.g. put them in sbin and bin
> > > folders
> > > > respectively
> > >
> > > Yup, right now we don't have any attempt to install on standard system
> > > paths.
> > >
> > > > 3) If there are multiple ways to specify an option, an overriding
> rule
> > > > should be present and should not be error-prone.
> > >
> > > Yes, I think this should always be Configuration class in code > system
> > > properties > env vars. Over time we will deprecate the env vars and
> maybe
> > > even system properties.
> > >
> > > > 4) Currently the options are set and get using System property. It's
> > hard
> > > > to manage and inconvenient for users. It's good to gather the options
> > > into
> > > > one file using format like xml or json.
> > >
> > > I think this is the main thing to do first -- pick one configuration
> > class
> > > and change the code to use this.
> > >
> > > > Our rough proposal:
> > > >
> > > >   - Scripts
> > > >
> > > >   1. make an "sbin" folder containing all the scripts for
> > administrators,
> > > >   specifically,
> > > >      - all service administration scripts, i.e. start-*, stop-*,
> > > >      slaves.sh, *-daemons, *-daemon scripts
> > > >      - low-level or internally used utility scripts, i.e.
> > > >      compute-classpath, spark-config, spark-class, spark-executor
> > > >   2. make a "bin" folder containing all the scripts for application
> > > >   developers/users, specifically,
> > > >      - user level app  running scripts, i.e. pyspark, spark-shell,
> and
> > we
> > > >      propose to add a script "spark" for users to run applications
> > (very
> > > much
> > > >      like spark-class but may add some more control or convenient
> > > utilities)
> > > >      - scripts for status checking, e.g. spark and hadoop version
> > > >      checking, running applications checking, etc. We can make this a
> > > separate
> > > >      script or add functionality to "spark" script.
> > > >   3. No wandering scripts outside the sbin and bin folders
> > >
> > > Makes sense.
> > >
> > > >   -  Configurations/Options and overriding rule
> > > >
> > > >   1. Define a Configuration class which contains all the options
> > > available
> > > >   for Spark application. A Configuration instance can be
> de-/serialized
> > > >   from/to a json formatted file.
> > > >   2. Each application (SparkContext) has one Configuration instance
> and
> > > it
> > > >   is initialized by the application which creates it (either read
> from
> > > file
> > > >   or passed from command line options or env SPARK_JAVA_OPTS).
> > > >   3. When launching an Executor on a node, the Configuration is
> firstly
> > > >   initialized using the node-local configuration file as default. The
> > > >   Configuration passed from application driver context will override
> > any
> > > >   options specified in default.
> > >
> > > This sounds great to me! The one thing I'll add is that we might want
> to
> > > prevent applications from overriding certain settings on each node,
> such
> > as
> > > work directories. The best way is to probably just ignore the app's
> > version
> > > of those settings in the Executor.
> > >
> > > If you guys would like, feel free to write up this design on SPARK-544
> > and
> > > start working on it. I think it looks good.
> > >
> > > Matei
> >
> >
> >
> >
> > --
> > *Shane Huang *
> > *Intel Asia-Pacific R&D Ltd.*
> > *Email: shengsheng.huang@intel.com*
> >
>



-- 
*Shane Huang *
*Intel Asia-Pacific R&D Ltd.*
*Email: shengsheng.huang@intel.com*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message