mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chawla,Sumit " <sumitkcha...@gmail.com>
Subject Re: Mesos Spark Fine Grained Execution - CPU count
Date Mon, 19 Dec 2016 23:23:52 GMT
Tim,

We will try to run the application in coarse grain mode, and share the
findings with you.

Regards
Sumit Chawla


On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen <tnachen@gmail.com> wrote:

> Dynamic allocation works with Coarse grain mode only, we wasn't aware
> a need for Fine grain mode after we enabled dynamic allocation support
> on the coarse grain mode.
>
> What's the reason you're running fine grain mode instead of coarse
> grain + dynamic allocation?
>
> Tim
>
> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
> <mehdi.meziane@ldmobile.net> wrote:
> > We will be interested by the results if you give a try to Dynamic
> allocation
> > with mesos !
> >
> >
> > ----- Mail Original -----
> > De: "Michael Gummelt" <mgummelt@mesosphere.io>
> > À: "Sumit Chawla" <sumitkchawla@gmail.com>
> > Cc: user@mesos.apache.org, dev@mesos.apache.org, "User"
> > <user@spark.apache.org>, dev@spark.apache.org
> > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam / Berlin /
> > Berne / Rome / Stockholm / Vienne
> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
> >
> >
> >> Is this problem of idle executors sticking around solved in Dynamic
> >> Resource Allocation?  Is there some timeout after which Idle executors
> can
> >> just shutdown and cleanup its resources.
> >
> > Yes, that's exactly what dynamic allocation does.  But again I have no
> idea
> > what the state of dynamic allocation + mesos is.
> >
> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit <sumitkchawla@gmail.com>
> > wrote:
> >>
> >> Great.  Makes much better sense now.  What will be reason to have
> >> spark.mesos.mesosExecutor.cores more than 1, as this number doesn't
> include
> >> the number of cores for tasks.
> >>
> >> So in my case it seems like 30 CPUs are allocated to executors.  And
> there
> >> are 48 tasks so 48 + 30 =  78 CPUs.  And i am noticing this gap of 30 is
> >> maintained till the last task exits.  This explains the gap.   Thanks
> >> everyone.  I am still not sure how this number 30 is calculated.  ( Is
> it
> >> dynamic based on current resources, or is it some configuration.  I
> have 32
> >> nodes in my cluster).
> >>
> >> Is this problem of idle executors sticking around solved in Dynamic
> >> Resource Allocation?  Is there some timeout after which Idle executors
> can
> >> just shutdown and cleanup its resources.
> >>
> >>
> >> Regards
> >> Sumit Chawla
> >>
> >>
> >> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt <
> mgummelt@mesosphere.io>
> >> wrote:
> >>>
> >>> >  I should preassume that No of executors should be less than number
> of
> >>> > tasks.
> >>>
> >>> No.  Each executor runs 0 or more tasks.
> >>>
> >>> Each executor consumes 1 CPU, and each task running on that executor
> >>> consumes another CPU.  You can customize this via
> >>> spark.mesos.mesosExecutor.cores
> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/running-on-mesos.md)
> and
> >>> spark.task.cpus
> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/configuration.md)
> >>>
> >>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit <sumitkchawla@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Ah thanks. looks like i skipped reading this "Neither will executors
> >>>> terminate when they’re idle."
> >>>>
> >>>> So in my job scenario,  I should preassume that No of executors should
> >>>> be less than number of tasks. Ideally one executor should execute 1
> or more
> >>>> tasks.  But i am observing something strange instead.  I start my job
> with
> >>>> 48 partitions for a spark job. In mesos ui i see that number of tasks
> is 48,
> >>>> but no. of CPUs is 78 which is way more than 48.  Here i am assuming
> that 1
> >>>> CPU is 1 executor.   I am not specifying any configuration to set
> number of
> >>>> cores per executor.
> >>>>
> >>>> Regards
> >>>> Sumit Chawla
> >>>>
> >>>>
> >>>> On Mon, Dec 19, 2016 at 11:35 AM, Joris Van Remoortere
> >>>> <joris@mesosphere.io> wrote:
> >>>>>
> >>>>> That makes sense. From the documentation it looks like the executors
> >>>>> are not supposed to terminate:
> >>>>>
> >>>>> http://spark.apache.org/docs/latest/running-on-mesos.html#
> fine-grained-deprecated
> >>>>>>
> >>>>>> Note that while Spark tasks in fine-grained will relinquish
cores as
> >>>>>> they terminate, they will not relinquish memory, as the JVM
does
> not give
> >>>>>> memory back to the Operating System. Neither will executors
> terminate when
> >>>>>> they’re idle.
> >>>>>
> >>>>>
> >>>>> I suppose your task to executor CPU ratio is low enough that it
looks
> >>>>> like most of the resources are not being reclaimed. If your tasks
> were using
> >>>>> significantly more CPU the amortized cost of the idle executors
> would not be
> >>>>> such a big deal.
> >>>>>
> >>>>>
> >>>>> —
> >>>>> Joris Van Remoortere
> >>>>> Mesosphere
> >>>>>
> >>>>> On Mon, Dec 19, 2016 at 11:26 AM, Timothy Chen <tnachen@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi Chawla,
> >>>>>>
> >>>>>> One possible reason is that Mesos fine grain mode also takes
up
> cores
> >>>>>> to run the executor per host, so if you have 20 agents running
Fine
> >>>>>> grained executor it will take up 20 cores while it's still running.
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>> On Fri, Dec 16, 2016 at 8:41 AM, Chawla,Sumit <
> sumitkchawla@gmail.com>
> >>>>>> wrote:
> >>>>>> > Hi
> >>>>>> >
> >>>>>> > I am using Spark 1.6. I have one query about Fine Grained
model in
> >>>>>> > Spark.
> >>>>>> > I have a simple Spark application which transforms A ->
B.  Its a
> >>>>>> > single
> >>>>>> > stage application.  To begin the program, It starts with
48
> >>>>>> > partitions.
> >>>>>> > When the program starts running, in mesos UI it shows 48
tasks and
> >>>>>> > 48 CPUs
> >>>>>> > allocated to job.  Now as the tasks get done, the number
of active
> >>>>>> > tasks
> >>>>>> > number starts decreasing.  How ever, the number of CPUs
does not
> >>>>>> > decrease
> >>>>>> > propotionally.  When the job was about to finish, there
was a
> single
> >>>>>> > remaininig task, however CPU count was still 20.
> >>>>>> >
> >>>>>> > My questions, is why there is no one to one mapping between
tasks
> >>>>>> > and cpus
> >>>>>> > in Fine grained?  How can these CPUs be released when the
job is
> >>>>>> > done, so
> >>>>>> > that other jobs can start.
> >>>>>> >
> >>>>>> >
> >>>>>> > Regards
> >>>>>> > Sumit Chawla
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Michael Gummelt
> >>> Software Engineer
> >>> Mesosphere
> >>
> >>
> >
> >
> >
> > --
> > Michael Gummelt
> > Software Engineer
> > Mesosphere
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message