spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shane Huang <shannie.hu...@gmail.com>
Subject Re: Propose to Re-organize the scripts and configurations
Date Sun, 22 Sep 2013 04:07:05 GMT
And I created a new issue SPARK-915 to track the re-org of scripts as
SPARK-544 only talks about Config.
https://spark-project.atlassian.net/browse/SPARK-915


On Wed, Sep 18, 2013 at 1:42 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Hi Shane,
>
> I agree with all these points. Improving the configuration system is one
> of the main things I'd like to have in the next release.
>
> > 1) Usually the application developers/users and platform administrators
> > belongs to two teams. So it's better to separate the scripts used by
> > administrators and application users, e.g. put them in sbin and bin
> folders
> > respectively
>
> Yup, right now we don't have any attempt to install on standard system
> paths.
>
> > 3) If there are multiple ways to specify an option, an overriding rule
> > should be present and should not be error-prone.
>
> Yes, I think this should always be Configuration class in code > system
> properties > env vars. Over time we will deprecate the env vars and maybe
> even system properties.
>
> > 4) Currently the options are set and get using System property. It's hard
> > to manage and inconvenient for users. It's good to gather the options
> into
> > one file using format like xml or json.
>
> I think this is the main thing to do first -- pick one configuration class
> and change the code to use this.
>
> > Our rough proposal:
> >
> >   - Scripts
> >
> >   1. make an "sbin" folder containing all the scripts for administrators,
> >   specifically,
> >      - all service administration scripts, i.e. start-*, stop-*,
> >      slaves.sh, *-daemons, *-daemon scripts
> >      - low-level or internally used utility scripts, i.e.
> >      compute-classpath, spark-config, spark-class, spark-executor
> >   2. make a "bin" folder containing all the scripts for application
> >   developers/users, specifically,
> >      - user level app  running scripts, i.e. pyspark, spark-shell, and we
> >      propose to add a script "spark" for users to run applications (very
> much
> >      like spark-class but may add some more control or convenient
> utilities)
> >      - scripts for status checking, e.g. spark and hadoop version
> >      checking, running applications checking, etc. We can make this a
> separate
> >      script or add functionality to "spark" script.
> >   3. No wandering scripts outside the sbin and bin folders
>
> Makes sense.
>
> >   -  Configurations/Options and overriding rule
> >
> >   1. Define a Configuration class which contains all the options
> available
> >   for Spark application. A Configuration instance can be de-/serialized
> >   from/to a json formatted file.
> >   2. Each application (SparkContext) has one Configuration instance and
> it
> >   is initialized by the application which creates it (either read from
> file
> >   or passed from command line options or env SPARK_JAVA_OPTS).
> >   3. When launching an Executor on a node, the Configuration is firstly
> >   initialized using the node-local configuration file as default. The
> >   Configuration passed from application driver context will override any
> >   options specified in default.
>
> This sounds great to me! The one thing I'll add is that we might want to
> prevent applications from overriding certain settings on each node, such as
> work directories. The best way is to probably just ignore the app's version
> of those settings in the Executor.
>
> If you guys would like, feel free to write up this design on SPARK-544 and
> start working on it. I think it looks good.
>
> Matei




-- 
*Shane Huang *
*Intel Asia-Pacific R&D Ltd.*
*Email: shengsheng.huang@intel.com*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message