spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Spark config option 'expression language' feedback request
Date Tue, 31 Mar 2015 06:37:56 GMT
Reviving this to see if others would like to chime in about this
"expression language" for config options.


On Fri, Mar 13, 2015 at 7:57 PM, Dale Richardson <dale__r@hotmail.com>
wrote:

> Mridul,I may have added some confusion by giving examples in completely
> different areas. For example the number of cores available for tasking on
> each worker machine is a resource-controller level configuration variable.
> In standalone mode (ie using Spark's home-grown resource manager) the
> configuration variable SPARK_WORKER_CORES is an item that spark admins can
> set (and we can use expressions for). The equivalent variable for YARN
> (Yarn.nodemanager.resource.cpu-vcores) is only used by Yarn's node manager
> setup and is set by Yarn administrators and outside of control of spark
> (and most users).  If you are not a cluster administrator then both
> variables are irrelevant to you. The same goes for SPARK_WORKER_MEMORY.
>
> As for spark.executor.memory,  As there is no way to know the attributes
> of a machine before a task is allocated to it, we cannot use any of the
> JVMInfo functions. For options like that the expression parser can easily
> be limited to supporting different byte units of scale (kb/mb/gb etc) and
> other configuration variables only.
> Regards,Dale.
>
>
>
>
> > Date: Fri, 13 Mar 2015 17:30:51 -0700
> > Subject: Re: Spark config option 'expression language' feedback request
> > From: mridul@gmail.com
> > To: dale__r@hotmail.com
> > CC: dev@spark.apache.org
> >
> > Let me try to rephrase my query.
> > How can a user specify, for example, what the executor memory should
> > be or number of cores should be.
> >
> > I dont want a situation where some variables can be specified using
> > one set of idioms (from this PR for example) and another set cannot
> > be.
> >
> >
> > Regards,
> > Mridul
> >
> >
> >
> >
> > On Fri, Mar 13, 2015 at 4:06 PM, Dale Richardson <dale__r@hotmail.com>
> wrote:
> > >
> > >
> > >
> > > Thanks for your questions Mridul.
> > > I assume you are referring to how the functionality to query system
> state works in Yarn and Mesos?
> > > The API's used are the standard JVM API's so the functionality will
> work without change. There is no real use case for using
> 'physicalMemoryBytes' in these cases though, as the JVM size has already
> been limited by the resource manager.
> > > Regards,Dale.
> > >> Date: Fri, 13 Mar 2015 08:20:33 -0700
> > >> Subject: Re: Spark config option 'expression language' feedback
> request
> > >> From: mridul@gmail.com
> > >> To: dale__r@hotmail.com
> > >> CC: dev@spark.apache.org
> > >>
> > >> I am curious how you are going to support these over mesos and yarn.
> > >> Any configure change like this should be applicable to all of them,
> not
> > >> just local and standalone modes.
> > >>
> > >> Regards
> > >> Mridul
> > >>
> > >> On Friday, March 13, 2015, Dale Richardson <dale__r@hotmail.com>
> wrote:
> > >>
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature
> to
> > >> > allow for Spark configuration options (whether on command line,
> environment
> > >> > variable or a configuration file) to be specified via a simple
> expression
> > >> > language.
> > >> >
> > >> >
> > >> > Such a feature has the following end-user benefits:
> > >> > - Allows for the flexibility in specifying time intervals or byte
> > >> > quantities in appropriate and easy to follow units e.g. 1 week
> rather
> > >> > rather then 604800 seconds
> > >> >
> > >> > - Allows for the scaling of a configuration option in relation to
a
> system
> > >> > attributes. e.g.
> > >> >
> > >> > SPARK_WORKER_CORES = numCores - 1
> > >> >
> > >> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB
> > >> >
> > >> > - Gives the ability to scale multiple configuration options
> together eg:
> > >> >
> > >> > spark.driver.memory = 0.75 * physicalMemoryBytes
> > >> >
> > >> > spark.driver.maxResultSize = spark.driver.memory * 0.8
> > >> >
> > >> >
> > >> > The following functions are currently supported by this PR:
> > >> > NumCores:             Number of cores assigned to the JVM (usually
> ==
> > >> > Physical machine cores)
> > >> > PhysicalMemoryBytes:  Memory size of hosting machine
> > >> >
> > >> > JVMTotalMemoryBytes:  Current bytes of memory allocated to the JVM
> > >> >
> > >> > JVMMaxMemoryBytes:    Maximum number of bytes of memory available
> to the
> > >> > JVM
> > >> >
> > >> > JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes
> > >> >
> > >> >
> > >> > I was wondering if anybody on the mailing list has any further
> ideas on
> > >> > other functions that could be useful to have when specifying spark
> > >> > configuration options?
> > >> > Regards,Dale.
> > >> >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message