Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <CACo38_SwimKpDm2FV4Ruj+3-2rQmo3dRcS30K2mPpwgKQ31wGA@mail.gmail.com>
References: <320478199.533931482187422891.JavaMail.root@zappa>
 <683244062.533951482187531504.JavaMail.root@zappa> <CAFx0iW_SeSMtWmRBqwE8hxoaam+xaN_nBeTPTBHv-04=XUDTNQ@mail.gmail.com>
 <CAFgrA0+O5r=U4=JUqLbeO9=-QGYLJzXdkSbz1puQDs18Ch1OAg@mail.gmail.com>
 <CAGXXjqHXjvq_3buGpSuXzHb=g0wAZ9jSFPi9Fs1A9z+g_RPpgA@mail.gmail.com>
 <CAGurX6P2nVw=pC9-kwjffaw5eFNa7OsM=nhs+AK4LcQj3sysmg@mail.gmail.com> <CACo38_SwimKpDm2FV4Ruj+3-2rQmo3dRcS30K2mPpwgKQ31wGA@mail.gmail.com>
From: Michael Gummelt <mgummelt@mesosphere.io>
Date: Mon, 26 Dec 2016 13:04:05 -0800
Message-ID: <CAGurX6N5TTw_1RJiatANDmdqdLkSg169i=ksOtWiTtK1QyGTiQ@mail.gmail.com>
Subject: Re: Mesos Spark Fine Grained Execution - CPU count
To: Jacek Laskowski <jacek@japila.pl>
Cc: Davies Liu <davies.liu@gmail.com>, Dev <dev@mesos.apache.org>,
	Timothy Chen <tnachen@gmail.com>, Mehdi Meziane <mehdi.meziane@ldmobile.net>,
	"user@mesos.apache.org" <user@mesos.apache.org>, User <user@spark.apache.org>,
	"dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=94eb2c1230c4787f7d05449613f2
archived-at: Mon, 26 Dec 2016 21:04:20 -0000

--94eb2c1230c4787f7d05449613f2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

In fine-grained mode (which is deprecated), Spark tasks (which are threads)
were implemented as Mesos tasks.  When a Mesos task starts and stops, its
underlying cgroup, and therefore the resources its consuming on the
cluster, grows or shrinks based on the resources allocated to the tasks,
which in Spark is just CPU.  This is what I mean by CPU usage "elastically
growing".

However, all Mesos tasks are run by an "executor", which has its own
resource allocation.  In Spark, the executor is the JVM, and all memory is
allocated to the executor, because JVMs can't relinquish memory.  If memory
were allocated to the tasks, then the cgroup's memory allocation would
shrink when the task terminated, but the JVM's memory consumption would
stay constant, and the JVM would OOM.

And, without dynamic allocation, executors never terminate during the
duration of a Spark job, because even if they're idle (no tasks), they
still may be hosting shuffle files.  That's why dynamic allocation depends
on an external shuffle service.  Since executors never terminate, and all
memory is allocated to the executors, Spark jobs even in fine-grained mode
only grow in memory allocation, they don't shrink.

On Mon, Dec 26, 2016 at 12:39 PM, Jacek Laskowski <jacek@japila.pl> wrote:

> Hi Michael,
>
> That caught my attention...
>
> Could you please elaborate on "elastically grow and shrink CPU usage"
> and how it really works under the covers? It seems that CPU usage is
> just a "label" for an executor on Mesos. Where's this in the code?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Dec 26, 2016 at 6:25 PM, Michael Gummelt <mgummelt@mesosphere.io>
> wrote:
> >> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
> >> allocation
> >
> > Maybe for CPU, but definitely not for memory.  Executors never shut dow=
n
> in
> > fine-grained mode, which means you only elastically grow and shrink CPU
> > usage, not memory.
> >
> > On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu <davies.liu@gmail.com>
> wrote:
> >>
> >> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
> >> allocation, but have to pay a little more overhead for launching a
> >> task, which should be OK if the task is not trivial.
> >>
> >> Since the direct result (up to 1M by default) will also go through
> >> mesos, it's better to tune it lower, otherwise mesos could become the
> >> bottleneck.
> >>
> >> spark.task.maxDirectResultSize
> >>
> >> On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit <sumitkchawla@gmail.com>
> >> wrote:
> >> > Tim,
> >> >
> >> > We will try to run the application in coarse grain mode, and share t=
he
> >> > findings with you.
> >> >
> >> > Regards
> >> > Sumit Chawla
> >> >
> >> >
> >> > On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen <tnachen@gmail.com>
> wrote:
> >> >
> >> >> Dynamic allocation works with Coarse grain mode only, we wasn't awa=
re
> >> >> a need for Fine grain mode after we enabled dynamic allocation
> support
> >> >> on the coarse grain mode.
> >> >>
> >> >> What's the reason you're running fine grain mode instead of coarse
> >> >> grain + dynamic allocation?
> >> >>
> >> >> Tim
> >> >>
> >> >> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
> >> >> <mehdi.meziane@ldmobile.net> wrote:
> >> >> > We will be interested by the results if you give a try to Dynamic
> >> >> allocation
> >> >> > with mesos !
> >> >> >
> >> >> >
> >> >> > ----- Mail Original -----
> >> >> > De: "Michael Gummelt" <mgummelt@mesosphere.io>
> >> >> > =C3=80: "Sumit Chawla" <sumitkchawla@gmail.com>
> >> >> > Cc: user@mesos.apache.org, dev@mesos.apache.org, "User"
> >> >> > <user@spark.apache.org>, dev@spark.apache.org
> >> >> > Envoy=C3=A9: Lundi 19 D=C3=A9cembre 2016 22h42:55 GMT +01:00 Amst=
erdam /
> Berlin
> >> >> > /
> >> >> > Berne / Rome / Stockholm / Vienne
> >> >> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
> >> >> >
> >> >> >
> >> >> >> Is this problem of idle executors sticking around solved in
> Dynamic
> >> >> >> Resource Allocation?  Is there some timeout after which Idle
> >> >> >> executors
> >> >> can
> >> >> >> just shutdown and cleanup its resources.
> >> >> >
> >> >> > Yes, that's exactly what dynamic allocation does.  But again I ha=
ve
> >> >> > no
> >> >> idea
> >> >> > what the state of dynamic allocation + mesos is.
> >> >> >
> >> >> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit
> >> >> > <sumitkchawla@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Great.  Makes much better sense now.  What will be reason to hav=
e
> >> >> >> spark.mesos.mesosExecutor.cores more than 1, as this number
> doesn't
> >> >> include
> >> >> >> the number of cores for tasks.
> >> >> >>
> >> >> >> So in my case it seems like 30 CPUs are allocated to executors.
> And
> >> >> there
> >> >> >> are 48 tasks so 48 + 30 =3D  78 CPUs.  And i am noticing this ga=
p of
> >> >> >> 30 is
> >> >> >> maintained till the last task exits.  This explains the gap.
> >> >> >> Thanks
> >> >> >> everyone.  I am still not sure how this number 30 is calculated.
> (
> >> >> >> Is
> >> >> it
> >> >> >> dynamic based on current resources, or is it some configuration.
> I
> >> >> have 32
> >> >> >> nodes in my cluster).
> >> >> >>
> >> >> >> Is this problem of idle executors sticking around solved in
> Dynamic
> >> >> >> Resource Allocation?  Is there some timeout after which Idle
> >> >> >> executors
> >> >> can
> >> >> >> just shutdown and cleanup its resources.
> >> >> >>
> >> >> >>
> >> >> >> Regards
> >> >> >> Sumit Chawla
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt <
> >> >> mgummelt@mesosphere.io>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> >  I should preassume that No of executors should be less than
> >> >> >>> > number
> >> >> of
> >> >> >>> > tasks.
> >> >> >>>
> >> >> >>> No.  Each executor runs 0 or more tasks.
> >> >> >>>
> >> >> >>> Each executor consumes 1 CPU, and each task running on that
> >> >> >>> executor
> >> >> >>> consumes another CPU.  You can customize this via
> >> >> >>> spark.mesos.mesosExecutor.cores
> >> >> >>>
> >> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/
> running-on-mesos.md)
> >> >> and
> >> >> >>> spark.task.cpus
> >> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/
> configuration.md)
> >> >> >>>
> >> >> >>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit
> >> >> >>> <sumitkchawla@gmail.com
> >> >> >
> >> >> >>> wrote:
> >> >> >>>>
> >> >> >>>> Ah thanks. looks like i skipped reading this "Neither will
> >> >> >>>> executors
> >> >> >>>> terminate when they=E2=80=99re idle."
> >> >> >>>>
> >> >> >>>> So in my job scenario,  I should preassume that No of executor=
s
> >> >> >>>> should
> >> >> >>>> be less than number of tasks. Ideally one executor should
> execute
> >> >> >>>> 1
> >> >> or more
> >> >> >>>> tasks.  But i am observing something strange instead.  I start
> my
> >> >> >>>> job
> >> >> with
> >> >> >>>> 48 partitions for a spark job. In mesos ui i see that number o=
f
> >> >> >>>> tasks
> >> >> is 48,
> >> >> >>>> but no. of CPUs is 78 which is way more than 48.  Here i am
> >> >> >>>> assuming
> >> >> that 1
> >> >> >>>> CPU is 1 executor.   I am not specifying any configuration to
> set
> >> >> number of
> >> >> >>>> cores per executor.
> >> >> >>>>
> >> >> >>>> Regards
> >> >> >>>> Sumit Chawla
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Mon, Dec 19, 2016 at 11:35 AM, Joris Van Remoortere
> >> >> >>>> <joris@mesosphere.io> wrote:
> >> >> >>>>>
> >> >> >>>>> That makes sense. From the documentation it looks like the
> >> >> >>>>> executors
> >> >> >>>>> are not supposed to terminate:
> >> >> >>>>>
> >> >> >>>>> http://spark.apache.org/docs/latest/running-on-mesos.html#
> >> >> fine-grained-deprecated
> >> >> >>>>>>
> >> >> >>>>>> Note that while Spark tasks in fine-grained will relinquish
> >> >> >>>>>> cores as
> >> >> >>>>>> they terminate, they will not relinquish memory, as the JVM
> does
> >> >> not give
> >> >> >>>>>> memory back to the Operating System. Neither will executors
> >> >> terminate when
> >> >> >>>>>> they=E2=80=99re idle.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> I suppose your task to executor CPU ratio is low enough that =
it
> >> >> >>>>> looks
> >> >> >>>>> like most of the resources are not being reclaimed. If your
> tasks
> >> >> were using
> >> >> >>>>> significantly more CPU the amortized cost of the idle executo=
rs
> >> >> would not be
> >> >> >>>>> such a big deal.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> =E2=80=94
> >> >> >>>>> Joris Van Remoortere
> >> >> >>>>> Mesosphere
> >> >> >>>>>
> >> >> >>>>> On Mon, Dec 19, 2016 at 11:26 AM, Timothy Chen
> >> >> >>>>> <tnachen@gmail.com>
> >> >> >>>>> wrote:
> >> >> >>>>>>
> >> >> >>>>>> Hi Chawla,
> >> >> >>>>>>
> >> >> >>>>>> One possible reason is that Mesos fine grain mode also takes
> up
> >> >> cores
> >> >> >>>>>> to run the executor per host, so if you have 20 agents runni=
ng
> >> >> >>>>>> Fine
> >> >> >>>>>> grained executor it will take up 20 cores while it's still
> >> >> >>>>>> running.
> >> >> >>>>>>
> >> >> >>>>>> Tim
> >> >> >>>>>>
> >> >> >>>>>> On Fri, Dec 16, 2016 at 8:41 AM, Chawla,Sumit <
> >> >> sumitkchawla@gmail.com>
> >> >> >>>>>> wrote:
> >> >> >>>>>> > Hi
> >> >> >>>>>> >
> >> >> >>>>>> > I am using Spark 1.6. I have one query about Fine Grained
> >> >> >>>>>> > model in
> >> >> >>>>>> > Spark.
> >> >> >>>>>> > I have a simple Spark application which transforms A -> B.
> >> >> >>>>>> > Its a
> >> >> >>>>>> > single
> >> >> >>>>>> > stage application.  To begin the program, It starts with 4=
8
> >> >> >>>>>> > partitions.
> >> >> >>>>>> > When the program starts running, in mesos UI it shows 48
> tasks
> >> >> >>>>>> > and
> >> >> >>>>>> > 48 CPUs
> >> >> >>>>>> > allocated to job.  Now as the tasks get done, the number o=
f
> >> >> >>>>>> > active
> >> >> >>>>>> > tasks
> >> >> >>>>>> > number starts decreasing.  How ever, the number of CPUs do=
es
> >> >> >>>>>> > not
> >> >> >>>>>> > decrease
> >> >> >>>>>> > propotionally.  When the job was about to finish, there wa=
s
> a
> >> >> single
> >> >> >>>>>> > remaininig task, however CPU count was still 20.
> >> >> >>>>>> >
> >> >> >>>>>> > My questions, is why there is no one to one mapping betwee=
n
> >> >> >>>>>> > tasks
> >> >> >>>>>> > and cpus
> >> >> >>>>>> > in Fine grained?  How can these CPUs be released when the
> job
> >> >> >>>>>> > is
> >> >> >>>>>> > done, so
> >> >> >>>>>> > that other jobs can start.
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > Regards
> >> >> >>>>>> > Sumit Chawla
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Michael Gummelt
> >> >> >>> Software Engineer
> >> >> >>> Mesosphere
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Michael Gummelt
> >> >> > Software Engineer
> >> >> > Mesosphere
> >> >>
> >>
> >>
> >>
> >> --
> >>  - Davies
> >
> >
> >
> >
> > --
> > Michael Gummelt
> > Software Engineer
> > Mesosphere
>


--=20
Michael Gummelt
Software Engineer
Mesosphere

--94eb2c1230c4787f7d05449613f2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>In fine-grained mode (which is deprecated), Spar=
k tasks (which are threads) were implemented as Mesos tasks.=C2=A0 When a M=
esos task starts and stops, its underlying cgroup, and therefore the resour=
ces its consuming on the cluster, grows or shrinks based on the resources a=
llocated to the tasks, which in Spark is just CPU.=C2=A0 This is what I mea=
n by CPU usage &quot;elastically growing&quot;.<br><br></div>However, all M=
esos tasks are run by an &quot;executor&quot;, which has its own resource a=
llocation.=C2=A0 In Spark, the executor is the JVM, and all memory is alloc=
ated to the executor, because JVMs can&#39;t relinquish memory.=C2=A0 If me=
mory were allocated to the tasks, then the cgroup&#39;s memory allocation w=
ould shrink when the task terminated, but the JVM&#39;s memory consumption =
would stay constant, and the JVM would OOM.<br><br></div>And, without dynam=
ic allocation, executors never terminate during the duration of a Spark job=
, because even if they&#39;re idle (no tasks), they still may be hosting sh=
uffle files.=C2=A0 That&#39;s why dynamic allocation depends on an external=
 shuffle service.=C2=A0 Since executors never terminate, and all memory is =
allocated to the executors, Spark jobs even in fine-grained mode only grow =
in memory allocation, they don&#39;t shrink.<br></div><div class=3D"gmail_e=
xtra"><br><div class=3D"gmail_quote">On Mon, Dec 26, 2016 at 12:39 PM, Jace=
k Laskowski <span dir=3D"ltr">&lt;<a href=3D"mailto:jacek@japila.pl" target=
=3D"_blank">jacek@japila.pl</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">Hi Michael,<br>
<br>
That caught my attention...<br>
<br>
Could you please elaborate on &quot;elastically grow and shrink CPU usage&q=
uot;<br>
and how it really works under the covers? It seems that CPU usage is<br>
just a &quot;label&quot; for an executor on Mesos. Where&#39;s this in the =
code?<br>
<br>
Pozdrawiam,<br>
Jacek Laskowski<br>
----<br>
<a href=3D"https://medium.com/@jaceklaskowski/" rel=3D"noreferrer" target=
=3D"_blank">https://medium.com/@<wbr>jaceklaskowski/</a><br>
Mastering Apache Spark 2.0 <a href=3D"https://bit.ly/mastering-apache-spark=
" rel=3D"noreferrer" target=3D"_blank">https://bit.ly/mastering-<wbr>apache=
-spark</a><br>
Follow me at <a href=3D"https://twitter.com/jaceklaskowski" rel=3D"noreferr=
er" target=3D"_blank">https://twitter.com/<wbr>jaceklaskowski</a><br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
On Mon, Dec 26, 2016 at 6:25 PM, Michael Gummelt &lt;<a href=3D"mailto:mgum=
melt@mesosphere.io">mgummelt@mesosphere.io</a>&gt; wrote:<br>
&gt;&gt; Using 0 for spark.mesos.mesosExecutor.<wbr>cores is better than dy=
namic<br>
&gt;&gt; allocation<br>
&gt;<br>
&gt; Maybe for CPU, but definitely not for memory.=C2=A0 Executors never sh=
ut down in<br>
&gt; fine-grained mode, which means you only elastically grow and shrink CP=
U<br>
&gt; usage, not memory.<br>
&gt;<br>
&gt; On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu &lt;<a href=3D"mailto:dav=
ies.liu@gmail.com">davies.liu@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Using 0 for spark.mesos.mesosExecutor.<wbr>cores is better than dy=
namic<br>
&gt;&gt; allocation, but have to pay a little more overhead for launching a=
<br>
&gt;&gt; task, which should be OK if the task is not trivial.<br>
&gt;&gt;<br>
&gt;&gt; Since the direct result (up to 1M by default) will also go through=
<br>
&gt;&gt; mesos, it&#39;s better to tune it lower, otherwise mesos could bec=
ome the<br>
&gt;&gt; bottleneck.<br>
&gt;&gt;<br>
&gt;&gt; spark.task.maxDirectResultSize<br>
&gt;&gt;<br>
&gt;&gt; On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit &lt;<a href=3D"mailt=
o:sumitkchawla@gmail.com">sumitkchawla@gmail.com</a>&gt;<br>
&gt;&gt; wrote:<br>
&gt;&gt; &gt; Tim,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; We will try to run the application in coarse grain mode, and =
share the<br>
&gt;&gt; &gt; findings with you.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Regards<br>
&gt;&gt; &gt; Sumit Chawla<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen &lt;<a href=3D"=
mailto:tnachen@gmail.com">tnachen@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; Dynamic allocation works with Coarse grain mode only, we =
wasn&#39;t aware<br>
&gt;&gt; &gt;&gt; a need for Fine grain mode after we enabled dynamic alloc=
ation support<br>
&gt;&gt; &gt;&gt; on the coarse grain mode.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; What&#39;s the reason you&#39;re running fine grain mode =
instead of coarse<br>
&gt;&gt; &gt;&gt; grain + dynamic allocation?<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Tim<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane<br>
&gt;&gt; &gt;&gt; &lt;<a href=3D"mailto:mehdi.meziane@ldmobile.net">mehdi.m=
eziane@ldmobile.net</a>&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt; We will be interested by the results if you give a t=
ry to Dynamic<br>
&gt;&gt; &gt;&gt; allocation<br>
&gt;&gt; &gt;&gt; &gt; with mesos !<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; ----- Mail Original -----<br>
&gt;&gt; &gt;&gt; &gt; De: &quot;Michael Gummelt&quot; &lt;<a href=3D"mailt=
o:mgummelt@mesosphere.io">mgummelt@mesosphere.io</a>&gt;<br>
&gt;&gt; &gt;&gt; &gt; =C3=80: &quot;Sumit Chawla&quot; &lt;<a href=3D"mail=
to:sumitkchawla@gmail.com">sumitkchawla@gmail.com</a>&gt;<br>
&gt;&gt; &gt;&gt; &gt; Cc: <a href=3D"mailto:user@mesos.apache.org">user@me=
sos.apache.org</a>, <a href=3D"mailto:dev@mesos.apache.org">dev@mesos.apach=
e.org</a>, &quot;User&quot;<br>
&gt;&gt; &gt;&gt; &gt; &lt;<a href=3D"mailto:user@spark.apache.org">user@sp=
ark.apache.org</a>&gt;, <a href=3D"mailto:dev@spark.apache.org">dev@spark.a=
pache.org</a><br>
&gt;&gt; &gt;&gt; &gt; Envoy=C3=A9: Lundi 19 D=C3=A9cembre 2016 22h42:55 GM=
T +01:00 Amsterdam / Berlin<br>
&gt;&gt; &gt;&gt; &gt; /<br>
&gt;&gt; &gt;&gt; &gt; Berne / Rome / Stockholm / Vienne<br>
&gt;&gt; &gt;&gt; &gt; Objet: Re: Mesos Spark Fine Grained Execution - CPU =
count<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Is this problem of idle executors sticking aroun=
d solved in Dynamic<br>
&gt;&gt; &gt;&gt; &gt;&gt; Resource Allocation?=C2=A0 Is there some timeout=
 after which Idle<br>
&gt;&gt; &gt;&gt; &gt;&gt; executors<br>
&gt;&gt; &gt;&gt; can<br>
&gt;&gt; &gt;&gt; &gt;&gt; just shutdown and cleanup its resources.<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; Yes, that&#39;s exactly what dynamic allocation does=
.=C2=A0 But again I have<br>
&gt;&gt; &gt;&gt; &gt; no<br>
&gt;&gt; &gt;&gt; idea<br>
&gt;&gt; &gt;&gt; &gt; what the state of dynamic allocation + mesos is.<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit<br>
&gt;&gt; &gt;&gt; &gt; &lt;<a href=3D"mailto:sumitkchawla@gmail.com">sumitk=
chawla@gmail.com</a>&gt;<br>
&gt;&gt; &gt;&gt; &gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Great.=C2=A0 Makes much better sense now.=C2=A0 =
What will be reason to have<br>
&gt;&gt; &gt;&gt; &gt;&gt; spark.mesos.mesosExecutor.<wbr>cores more than 1=
, as this number doesn&#39;t<br>
&gt;&gt; &gt;&gt; include<br>
&gt;&gt; &gt;&gt; &gt;&gt; the number of cores for tasks.<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; So in my case it seems like 30 CPUs are allocate=
d to executors.=C2=A0 And<br>
&gt;&gt; &gt;&gt; there<br>
&gt;&gt; &gt;&gt; &gt;&gt; are 48 tasks so 48 + 30 =3D=C2=A0 78 CPUs.=C2=A0=
 And i am noticing this gap of<br>
&gt;&gt; &gt;&gt; &gt;&gt; 30 is<br>
&gt;&gt; &gt;&gt; &gt;&gt; maintained till the last task exits.=C2=A0 This =
explains the gap.<br>
&gt;&gt; &gt;&gt; &gt;&gt; Thanks<br>
&gt;&gt; &gt;&gt; &gt;&gt; everyone.=C2=A0 I am still not sure how this num=
ber 30 is calculated.=C2=A0 (<br>
&gt;&gt; &gt;&gt; &gt;&gt; Is<br>
&gt;&gt; &gt;&gt; it<br>
&gt;&gt; &gt;&gt; &gt;&gt; dynamic based on current resources, or is it som=
e configuration.=C2=A0 I<br>
&gt;&gt; &gt;&gt; have 32<br>
&gt;&gt; &gt;&gt; &gt;&gt; nodes in my cluster).<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Is this problem of idle executors sticking aroun=
d solved in Dynamic<br>
&gt;&gt; &gt;&gt; &gt;&gt; Resource Allocation?=C2=A0 Is there some timeout=
 after which Idle<br>
&gt;&gt; &gt;&gt; &gt;&gt; executors<br>
&gt;&gt; &gt;&gt; can<br>
&gt;&gt; &gt;&gt; &gt;&gt; just shutdown and cleanup its resources.<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Regards<br>
&gt;&gt; &gt;&gt; &gt;&gt; Sumit Chawla<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummel=
t &lt;<br>
&gt;&gt; &gt;&gt; <a href=3D"mailto:mgummelt@mesosphere.io">mgummelt@mesosp=
here.io</a>&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; &gt;=C2=A0 I should preassume that No of exe=
cutors should be less than<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; &gt; number<br>
&gt;&gt; &gt;&gt; of<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; &gt; tasks.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; No.=C2=A0 Each executor runs 0 or more tasks=
.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; Each executor consumes 1 CPU, and each task =
running on that<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; executor<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; consumes another CPU.=C2=A0 You can customiz=
e this via<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; spark.mesos.mesosExecutor.<wbr>cores<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; (<a href=3D"https://github.com/apache/spark/=
blob/v1.6.3/docs/running-on-mesos.md" rel=3D"noreferrer" target=3D"_blank">=
https://github.com/apache/<wbr>spark/blob/v1.6.3/docs/<wbr>running-on-mesos=
.md</a>)<br>
&gt;&gt; &gt;&gt; and<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; spark.task.cpus<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; (<a href=3D"https://github.com/apache/spark/=
blob/v1.6.3/docs/configuration.md" rel=3D"noreferrer" target=3D"_blank">htt=
ps://github.com/apache/<wbr>spark/blob/v1.6.3/docs/<wbr>configuration.md</a=
>)<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sum=
it<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; &lt;<a href=3D"mailto:sumitkchawla@gmail.com=
">sumitkchawla@gmail.com</a><br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; Ah thanks. looks like i skipped reading =
this &quot;Neither will<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; executors<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; terminate when they=E2=80=99re idle.&quo=
t;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; So in my job scenario,=C2=A0 I should pr=
eassume that No of executors<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; should<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; be less than number of tasks. Ideally on=
e executor should execute<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; 1<br>
&gt;&gt; &gt;&gt; or more<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; tasks.=C2=A0 But i am observing somethin=
g strange instead.=C2=A0 I start my<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; job<br>
&gt;&gt; &gt;&gt; with<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; 48 partitions for a spark job. In mesos =
ui i see that number of<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; tasks<br>
&gt;&gt; &gt;&gt; is 48,<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; but no. of CPUs is 78 which is way more =
than 48.=C2=A0 Here i am<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; assuming<br>
&gt;&gt; &gt;&gt; that 1<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; CPU is 1 executor.=C2=A0 =C2=A0I am not =
specifying any configuration to set<br>
&gt;&gt; &gt;&gt; number of<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; cores per executor.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; Regards<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; Sumit Chawla<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; On Mon, Dec 19, 2016 at 11:35 AM, Joris =
Van Remoortere<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt; &lt;<a href=3D"mailto:joris@mesosphere.i=
o">joris@mesosphere.io</a>&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; That makes sense. From the documenta=
tion it looks like the<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; executors<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; are not supposed to terminate:<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; <a href=3D"http://spark.apache.org/d=
ocs/latest/running-on-mesos.html#" rel=3D"noreferrer" target=3D"_blank">htt=
p://spark.apache.org/docs/<wbr>latest/running-on-mesos.html#</a><br>
&gt;&gt; &gt;&gt; fine-grained-deprecated<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; Note that while Spark tasks in f=
ine-grained will relinquish<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; cores as<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; they terminate, they will not re=
linquish memory, as the JVM does<br>
&gt;&gt; &gt;&gt; not give<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; memory back to the Operating Sys=
tem. Neither will executors<br>
&gt;&gt; &gt;&gt; terminate when<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; they=E2=80=99re idle.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; I suppose your task to executor CPU =
ratio is low enough that it<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; looks<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; like most of the resources are not b=
eing reclaimed. If your tasks<br>
&gt;&gt; &gt;&gt; were using<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; significantly more CPU the amortized=
 cost of the idle executors<br>
&gt;&gt; &gt;&gt; would not be<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; such a big deal.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; =E2=80=94<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; Joris Van Remoortere<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; Mesosphere<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; On Mon, Dec 19, 2016 at 11:26 AM, Ti=
mothy Chen<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; &lt;<a href=3D"mailto:tnachen@gmail.=
com">tnachen@gmail.com</a>&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; Hi Chawla,<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; One possible reason is that Meso=
s fine grain mode also takes up<br>
&gt;&gt; &gt;&gt; cores<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; to run the executor per host, so=
 if you have 20 agents running<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; Fine<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; grained executor it will take up=
 20 cores while it&#39;s still<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; running.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; Tim<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; On Fri, Dec 16, 2016 at 8:41 AM,=
 Chawla,Sumit &lt;<br>
&gt;&gt; &gt;&gt; <a href=3D"mailto:sumitkchawla@gmail.com">sumitkchawla@gm=
ail.com</a>&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Hi<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; I am using Spark 1.6. I hav=
e one query about Fine Grained<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; model in<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Spark.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; I have a simple Spark appli=
cation which transforms A -&gt; B.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Its a<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; single<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; stage application.=C2=A0 To=
 begin the program, It starts with 48<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; partitions.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; When the program starts run=
ning, in mesos UI it shows 48 tasks<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; and<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; 48 CPUs<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; allocated to job.=C2=A0 Now=
 as the tasks get done, the number of<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; active<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; tasks<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; number starts decreasing.=
=C2=A0 How ever, the number of CPUs does<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; not<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; decrease<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; propotionally.=C2=A0 When t=
he job was about to finish, there was a<br>
&gt;&gt; &gt;&gt; single<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; remaininig task, however CP=
U count was still 20.<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; My questions, is why there =
is no one to one mapping between<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; tasks<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; and cpus<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; in Fine grained?=C2=A0 How =
can these CPUs be released when the job<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; is<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; done, so<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; that other jobs can start.<=
br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Regards<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt; &gt; Sumit Chawla<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; --<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; Michael Gummelt<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; Software Engineer<br>
&gt;&gt; &gt;&gt; &gt;&gt;&gt; Mesosphere<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; --<br>
&gt;&gt; &gt;&gt; &gt; Michael Gummelt<br>
&gt;&gt; &gt;&gt; &gt; Software Engineer<br>
&gt;&gt; &gt;&gt; &gt; Mesosphere<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt;=C2=A0 - Davies<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Michael Gummelt<br>
&gt; Software Engineer<br>
&gt; Mesosphere<br>
</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br><div class=
=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><d=
iv><div dir=3D"ltr"><div><div><div></div>Michael Gummelt<br></div>Software =
Engineer<br></div>Mesosphere<br></div></div></div></div>
</div>

--94eb2c1230c4787f7d05449613f2--