Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
Date: Mon, 25 Jan 2016 13:57:24 -0800
Message-ID: 
 <CABwtvz5ZvVj=N=h8XY1-onwZvWiSJeAzNfQePZCse3HRbrmYfw@mail.gmail.com>
Subject: Standalone scheduler issue - one job occupies the whole cluster
 somehow
From: Mikhail Strebkov <strebkov@gmail.com>
To: user@spark.apache.org
Content-Type: multipart/alternative; boundary=001a113fb89475d373052a2fa7b1

--001a113fb89475d373052a2fa7b1
Content-Type: text/plain; charset=UTF-8

Hi all,

Recently we started having issues with one of our background processing
scripts which we run on Spark. The cluster runs only two jobs. One job runs
for days, and another is usually like a couple of hours. Both jobs have a
crob schedule. The cluster is small, just 2 slaves, 24 cores, 25.4 GB of
memory. Each job takes 6 cores and 6 GB per worker. So when both jobs are
running it's 12 cores out of 24 cores and 24 GB out of 25.4 GB. But
sometimes I see this:

https://www.dropbox.com/s/6uad4hrchqpihp4/Screen%20Shot%202016-01-25%20at%201.16.19%20PM.png

So basically the long running job somehow occupied the whole cluster and
the fast one can't make any progress because the cluster doesn't have
resources. That's what I see in the logs:

16/01/25 21:26:48 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient resources


When I log in to the slaves I see this:

slave 1:

> /usr/lib/jvm/java/bin/java -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 450 --hostname 10.191.4.151 *--cores 1 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.191.4.151:53144/user/Worker
> /usr/lib/jvm/java/bin/java -cp -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 451 --hostname 10.191.4.151 *--cores 1 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.191.4.151:53144/user/Worker


slave 2:

> /usr/lib/jvm/java/bin/java -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 1 --hostname 10.253.142.59 *--cores 3 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.253.142.59:33265/user/Worker
> /usr/lib/jvm/java/bin/java -cp -cp <some_jars> -Xms6144M -Xmx6144M
> -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler
> --executor-id 448 --hostname 10.253.142.59 *--cores 1 --app-id
> app-20160124152439-1468* --worker-url akka.tcp://
> sparkWorker@10.253.142.59:33265/user/Worker


so somehow Spark created 4 executors, 2 on each machine, 1 core + 1 core
and 3 cores + 1 core to get the total of 6 cores. But because 6 GB setting
is per executor, it ends up occupying 24 GB instead of 12 GB (2 executors,
3 cores + 3 cores) and blocks the other Spark job.

My wild guess is that for some reason 1 executor of the long job failed, so
the job becomes 3 cores short and asks the scheduler if it can get 3 more
cores, then the scheduler distributes it evenly across the slaves: 2 cores
+ 1 core but this distribution doesn't work until the short job finishes
(because the shor job holds the rest of the memory). This explains 3 + 1 on
one slave but doesn't explain 1 + 1 on another.

Did anyone experience anything similar to this? Any ideas how to avoid it?

Thanks,
Mikhail

--001a113fb89475d373052a2fa7b1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi all,<div><br></div><div>Recently we started having issu=
es with one of our background processing scripts which we run on Spark. The=
 cluster runs only two jobs. One job runs for days, and another is usually =
like a couple of hours. Both jobs have a crob schedule. The cluster is smal=
l, just 2 slaves, 24 cores, 25.4 GB of memory. Each job takes 6 cores and 6=
 GB per worker. So when both jobs are running it&#39;s 12 cores out of 24 c=
ores and 24 GB out of 25.4 GB. But sometimes I see this:</div><div><br></di=
v><div><a href=3D"https://www.dropbox.com/s/6uad4hrchqpihp4/Screen%20Shot%2=
02016-01-25%20at%201.16.19%20PM.png">https://www.dropbox.com/s/6uad4hrchqpi=
hp4/Screen%20Shot%202016-01-25%20at%201.16.19%20PM.png</a></div><div><br></=
div><div>So basically the long running job somehow occupied the whole clust=
er and the fast one can&#39;t make any progress because the cluster doesn&#=
39;t have resources. That&#39;s what I see in the logs:</div><div><br></div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;=
padding-left:1ex"><font face=3D"monospace, monospace">16/01/25 21:26:48 WAR=
N TaskSchedulerImpl: Initial job has not accepted any resources; check your=
 cluster UI to ensure that workers are registered and have sufficient resou=
rces</font></blockquote><div><br></div><div>When I log in to the slaves I s=
ee this:</div><div><br></div><div>slave 1:</div><blockquote class=3D"gmail_=
quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-=
color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font face=
=3D"monospace, monospace">/usr/lib/jvm/java/bin/java -cp &lt;some_jars&gt; =
-Xms6144M -Xmx6144M -Dspark.driver.port=3D42548 -Drun.mode=3Dproduction -XX=
:MaxPermSize=3D256m org.apache.spark.executor.CoarseGrainedExecutorBackend =
--driver-url akka.tcp://<a href=3D"http://sparkDriver@10.233.17.48:42548/us=
er/CoarseGrainedScheduler">sparkDriver@10.233.17.48:42548/user/CoarseGraine=
dScheduler</a> --executor-id 450 --hostname 10.191.4.151 <b>--cores 1 --app=
-id app-20160124152439-1468</b> --worker-url akka.tcp://<a href=3D"http://s=
parkWorker@10.191.4.151:53144/user/Worker">sparkWorker@10.191.4.151:53144/u=
ser/Worker</a><br></font><font face=3D"monospace, monospace">/usr/lib/jvm/j=
ava/bin/java -cp=C2=A0-cp &lt;some_jars&gt;=C2=A0-Xms6144M -Xmx6144M -Dspar=
k.driver.port=3D42548 -Drun.mode=3Dproduction -XX:MaxPermSize=3D256m org.ap=
ache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://<a=
 href=3D"http://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler"=
>sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler</a> --executor-=
id 451 --hostname 10.191.4.151 <b>--cores 1 --app-id app-20160124152439-146=
8</b> --worker-url akka.tcp://<a href=3D"http://sparkWorker@10.191.4.151:53=
144/user/Worker">sparkWorker@10.191.4.151:53144/user/Worker</a></font></blo=
ckquote><div><br></div><div>slave 2:</div><div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-c=
olor:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font face=
=3D"monospace, monospace">/usr/lib/jvm/java/bin/java -cp &lt;some_jars&gt;=
=C2=A0-Xms6144M -Xmx6144M -Dspark.driver.port=3D42548 -Drun.mode=3Dproducti=
on -XX:MaxPermSize=3D256m org.apache.spark.executor.CoarseGrainedExecutorBa=
ckend --driver-url akka.tcp://<a href=3D"http://sparkDriver@10.233.17.48:42=
548/user/CoarseGrainedScheduler">sparkDriver@10.233.17.48:42548/user/Coarse=
GrainedScheduler</a> --executor-id 1 --hostname 10.253.142.59 <b>--cores 3 =
--app-id app-20160124152439-1468</b> --worker-url akka.tcp://<a href=3D"htt=
p://sparkWorker@10.253.142.59:33265/user/Worker">sparkWorker@10.253.142.59:=
33265/user/Worker</a><br></font><font face=3D"monospace, monospace">/usr/li=
b/jvm/java/bin/java -cp=C2=A0-cp &lt;some_jars&gt;=C2=A0-Xms6144M -Xmx6144M=
 -Dspark.driver.port=3D42548 -Drun.mode=3Dproduction -XX:MaxPermSize=3D256m=
 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.t=
cp://<a href=3D"http://sparkDriver@10.233.17.48:42548/user/CoarseGrainedSch=
eduler">sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler</a> --ex=
ecutor-id 448 --hostname 10.253.142.59 <b>--cores 1 --app-id app-2016012415=
2439-1468</b> --worker-url akka.tcp://<a href=3D"http://sparkWorker@10.253.=
142.59:33265/user/Worker">sparkWorker@10.253.142.59:33265/user/Worker</a></=
font></blockquote><div><br></div><div>so somehow Spark created 4 executors,=
 2 on each machine, 1 core + 1 core and 3 cores + 1 core to get the total o=
f 6 cores. But because 6 GB setting is per executor, it ends up occupying 2=
4 GB instead of 12 GB (2 executors, 3 cores + 3 cores) and blocks the other=
 Spark job.</div></div><div><br></div><div>My wild guess is that for some r=
eason 1 executor of the long job failed, so the job becomes 3 cores short a=
nd asks the scheduler if it can get 3 more cores, then the scheduler distri=
butes it evenly across the slaves: 2 cores + 1 core but this distribution d=
oesn&#39;t work until the short job finishes (because the shor job holds th=
e rest of the memory). This explains 3 + 1 on one slave but doesn&#39;t exp=
lain 1 + 1 on another.</div><div><br></div><div>Did anyone experience anyth=
ing similar to this? Any ideas how to avoid it?</div><div><br></div><div>Th=
anks,</div><div>Mikhail</div></div>

--001a113fb89475d373052a2fa7b1--