Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAMSJ2fg3-j=0e1mSvo2Z4ZSMdbOioEPxO+1odKyV_BNHLwMj-Q@mail.gmail.com>
References: 
 <CAMSJ2firtS2g-+cBOhANy7Vv7ox0Tk72pC8vKgnammhKZihuXQ@mail.gmail.com>
	<CAPn6-YTSjLnCsSUA-WFptq15QN4rFP6TX8OCr8+6zFKe0DAJcQ@mail.gmail.com>
	<CAMSJ2fgrT7n01nLhF+X9k6ysvm0xH4Rs2kGZHcmbVbE=bdTOJw@mail.gmail.com>
	<CAMSJ2fg3-j=0e1mSvo2Z4ZSMdbOioEPxO+1odKyV_BNHLwMj-Q@mail.gmail.com>
Date: Tue, 8 Dec 2015 16:38:13 -0800
Message-ID: 
 <CAPn6-YTAcEAeEhtnW-cH7MA4RiLCDrJtaHrF_9vheuhMkBfw4A@mail.gmail.com>
Subject: Re: Local Mode: Executor thread leak?
From: Shixiong Zhu <zsxwing@gmail.com>
To: Richard Marscher <rmarscher@localytics.com>
Cc: user <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a1140fb3e35d8ab05266c4e3e

--001a1140fb3e35d8ab05266c4e3e
Content-Type: text/plain; charset=UTF-8

Could you send a PR to fix it? Thanks!

Best Regards,
Shixiong Zhu

2015-12-08 13:31 GMT-08:00 Richard Marscher <rmarscher@localytics.com>:

> Alright I was able to work through the problem.
>
> So the owning thread was one from the executor task launch worker, which
> at least in local mode runs the task and the related user code of the task.
> After judiciously naming every thread in the pools in the user code (with a
> custom ThreadFactory) I was able to trace down the leak to a couple thread
> pools that were not shut down properly by noticing the named threads
> accumulating in thread dumps of the JVM process.
>
> On Mon, Dec 7, 2015 at 6:41 PM, Richard Marscher <rmarscher@localytics.com
> > wrote:
>
>> Thanks for the response.
>>
>> The version is Spark 1.5.2.
>>
>> Some examples of the thread names:
>>
>> pool-1061-thread-1
>> pool-1059-thread-1
>> pool-1638-thread-1
>>
>> There become hundreds then thousands of these stranded in WAITING.
>>
>> I added logging to try to track the lifecycle of the thread pool in
>> Executor as mentioned before. Here is an excerpt, but every seems fine
>> there. Every executor that starts is also shut down and it seems like it
>> shuts down fine.
>>
>> 15/12/07 23:30:21 WARN o.a.s.e.Executor: Threads finished in executor
>> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@e5d036b[Terminated,
>> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
>> 15/12/07 23:30:28 WARN o.a.s.e.Executor: Executor driver created, thread
>> pool: java.util.concurrent.ThreadPoolExecutor@3bc41ae3[Running, pool
>> size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
>> 15/12/07 23:31:06 WARN o.a.s.e.Executor: Threads finished in executor
>> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@3bc41ae3[Terminated,
>> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 36]
>> 15/12/07 23:31:11 WARN o.a.s.e.Executor: Executor driver created, thread
>> pool: java.util.concurrent.ThreadPoolExecutor@6e85ece4[Running, pool
>> size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
>> 15/12/07 23:34:35 WARN o.a.s.e.Executor: Threads finished in executor
>> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@6e85ece4[Terminated,
>> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 288]
>>
>> Also here is an example thread dump of such a thread:
>>
>> "pool-493-thread-1" prio=10 tid=0x00007f0e60612800 nid=0x18c4 waiting on
>> condition [0x00007f0c33c3e000]
>>    java.lang.Thread.State: WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for  <0x00007f10b3e8fb60> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>>         at
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> On Mon, Dec 7, 2015 at 6:23 PM, Shixiong Zhu <zsxwing@gmail.com> wrote:
>>
>>> Which version are you using? Could you post these thread names here?
>>>
>>> Best Regards,
>>> Shixiong Zhu
>>>
>>> 2015-12-07 14:30 GMT-08:00 Richard Marscher <rmarscher@localytics.com>:
>>>
>>>> Hi,
>>>>
>>>> I've been running benchmarks against Spark in local mode in a long
>>>> running process. I'm seeing threads leaking each time it runs a job. It
>>>> doesn't matter if I recycle SparkContext constantly or have 1 context stay
>>>> alive for the entire application lifetime.
>>>>
>>>> I see a huge accumulation ongoing of "pool-xxxx-thread-1" threads with
>>>> the creating thread "Executor task launch worker-xx" where x's are numbers.
>>>> The number of leaks per launch worker varies but usually 1 to a few.
>>>>
>>>> Searching the Spark code the pool is created in the Executor class. It
>>>> is `.shutdown()` in the stop for the executor. I've wired up logging and
>>>> also tried shutdownNow() and awaitForTermination on the pools. Every seems
>>>> okay there for every Executor that is called with `stop()` but I'm still
>>>> not sure yet if every Executor is called as such, which I am looking into
>>>> now.
>>>>
>>>> What I'm curious to know is if anyone has seen a similar issue?
>>>>
>>>> --
>>>> *Richard Marscher*
>>>> Software Engineer
>>>> Localytics
>>>> Localytics.com <http://localytics.com/> | Our Blog
>>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics>
>>>>  | Facebook <http://facebook.com/localytics> | LinkedIn
>>>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>>>
>>>
>>>
>>
>>
>> --
>> *Richard Marscher*
>> Software Engineer
>> Localytics
>> Localytics.com <http://localytics.com/> | Our Blog
>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
>> Facebook <http://facebook.com/localytics> | LinkedIn
>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>
>
>
>
> --
> *Richard Marscher*
> Software Engineer
> Localytics
> Localytics.com <http://localytics.com/> | Our Blog
> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
> Facebook <http://facebook.com/localytics> | LinkedIn
> <http://www.linkedin.com/company/1148792?trk=tyah>
>

--001a1140fb3e35d8ab05266c4e3e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Could you send a PR to fix it? Thanks!</div><div class=3D"=
gmail_extra"><br clear=3D"all"><div><div class=3D"gmail_signature"><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">=
<p>Best Regards,</p><div>Shixiong Zhu</div></div></div></div></div></div></=
div></div></div></div>
<br><div class=3D"gmail_quote">2015-12-08 13:31 GMT-08:00 Richard Marscher =
<span dir=3D"ltr">&lt;<a href=3D"mailto:rmarscher@localytics.com" target=3D=
"_blank">rmarscher@localytics.com</a>&gt;</span>:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">Alright I was able to work through the problem.<d=
iv><br></div><div>So the owning thread was one from the executor task launc=
h worker, which at least in local mode runs the task and the related user c=
ode of the task. After judiciously naming every thread in the pools in the =
user code (with a custom ThreadFactory) I was able to trace down the leak t=
o a couple thread pools that were not shut down properly by noticing the na=
med threads accumulating in thread dumps of the JVM process.</div></div><di=
v class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><div cl=
ass=3D"gmail_quote">On Mon, Dec 7, 2015 at 6:41 PM, Richard Marscher <span =
dir=3D"ltr">&lt;<a href=3D"mailto:rmarscher@localytics.com" target=3D"_blan=
k">rmarscher@localytics.com</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">Thanks for the response.<div><br></div><div>The v=
ersion is Spark 1.5.2.</div><div><br></div><div>Some examples of the thread=
 names:</div><div><br></div><div>pool-1061-thread-1</div><div>pool-1059-thr=
ead-1</div><div>pool-1638-thread-1</div><div><br></div><div>There become hu=
ndreds then thousands of these stranded in WAITING.</div><div><br></div><di=
v>I added logging to try to track the lifecycle of the thread pool in Execu=
tor as mentioned before. Here is an excerpt, but every seems fine there. Ev=
ery executor that starts is also shut down and it seems like it shuts down =
fine.</div><div><br></div><div><div><font face=3D"monospace, monospace">15/=
12/07 23:30:21 WARN o.a.s.e.Executor: Threads finished in executor driver. =
pool shut down java.util.concurrent.ThreadPoolExecutor@e5d036b[Terminated, =
pool size =3D 0, active threads =3D 0, queued tasks =3D 0, completed tasks =
=3D 1]</font></div><div><font face=3D"monospace, monospace">15/12/07 23:30:=
28 WARN o.a.s.e.Executor: Executor driver created, thread pool: java.util.c=
oncurrent.ThreadPoolExecutor@3bc41ae3[Running, pool size =3D 0, active thre=
ads =3D 0, queued tasks =3D 0, completed tasks =3D 0]</font></div><div><fon=
t face=3D"monospace, monospace">15/12/07 23:31:06 WARN o.a.s.e.Executor: Th=
reads finished in executor driver. pool shut down java.util.concurrent.Thre=
adPoolExecutor@3bc41ae3[Terminated, pool size =3D 0, active threads =3D 0, =
queued tasks =3D 0, completed tasks =3D 36]</font></div><div><font face=3D"=
monospace, monospace">15/12/07 23:31:11 WARN o.a.s.e.Executor: Executor dri=
ver created, thread pool: java.util.concurrent.ThreadPoolExecutor@6e85ece4[=
Running, pool size =3D 0, active threads =3D 0, queued tasks =3D 0, complet=
ed tasks =3D 0]</font></div><div><font face=3D"monospace, monospace">15/12/=
07 23:34:35 WARN o.a.s.e.Executor: Threads finished in executor driver. poo=
l shut down java.util.concurrent.ThreadPoolExecutor@6e85ece4[Terminated, po=
ol size =3D 0, active threads =3D 0, queued tasks =3D 0, completed tasks =
=3D 288]</font></div></div><div><br></div><div>Also here is an example thre=
ad dump of such a thread:<br><br><div><font face=3D"monospace, monospace">&=
quot;pool-493-thread-1&quot; prio=3D10 tid=3D0x00007f0e60612800 nid=3D0x18c=
4 waiting on condition [0x00007f0c33c3e000]</font></div><div><font face=3D"=
monospace, monospace">=C2=A0 =C2=A0java.lang.Thread.State: WAITING (parking=
)</font></div><div><font face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 at sun.misc.Unsafe.park(Native Method)</font></div><div><font face=
=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 - parking to wait for=
 =C2=A0&lt;0x00007f10b3e8fb60&gt; (a java.util.concurrent.locks.AbstractQue=
uedSynchronizer$ConditionObject)</font></div><div><font face=3D"monospace, =
monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.locks.LockSu=
pport.park(LockSupport.java:186)</font></div><div><font face=3D"monospace, =
monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.locks.Abstra=
ctQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:=
2043)</font></div><div><font face=3D"monospace, monospace">=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlocki=
ngQueue.java:442)</font></div><div><font face=3D"monospace, monospace">=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.getTask=
(ThreadPoolExecutor.java:1068)</font></div><div><font face=3D"monospace, mo=
nospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExec=
utor.runWorker(ThreadPoolExecutor.java:1130)</font></div><div><font face=3D=
"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.=
ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)</font></div><div=
><font face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.la=
ng.Thread.run(Thread.java:745)</font></div></div></div><div><div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Dec 7, 2015 at 6:23=
 PM, Shixiong Zhu <span dir=3D"ltr">&lt;<a href=3D"mailto:zsxwing@gmail.com=
" target=3D"_blank">zsxwing@gmail.com</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><div dir=3D"ltr">Which version are you using? Could you =
post these thread names here?</div><div class=3D"gmail_extra"><br clear=3D"=
all"><div><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"=
><div><div dir=3D"ltr"><p>Best Regards,</p><div>Shixiong Zhu</div></div></d=
iv></div></div></div></div></div></div></div><div><div>
<br><div class=3D"gmail_quote">2015-12-07 14:30 GMT-08:00 Richard Marscher =
<span dir=3D"ltr">&lt;<a href=3D"mailto:rmarscher@localytics.com" target=3D=
"_blank">rmarscher@localytics.com</a>&gt;</span>:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>I&#39;ve been running benc=
hmarks against Spark in local mode in a long running process. I&#39;m seein=
g threads leaking each time it runs a job. It doesn&#39;t matter if I recyc=
le SparkContext constantly or have 1 context stay alive for the entire appl=
ication lifetime.</div><div><br></div><div>I see a huge accumulation ongoin=
g of &quot;pool-xxxx-thread-1&quot; threads with the creating thread &quot;=
Executor task launch worker-xx&quot; where x&#39;s are numbers. The number =
of leaks per launch worker varies but usually 1 to a few.</div><div><br></d=
iv><div>Searching the Spark code the pool is created in the Executor class.=
 It is `.shutdown()` in the stop for the executor. I&#39;ve wired up loggin=
g and also tried shutdownNow() and awaitForTermination on the pools. Every =
seems okay there for every Executor that is called with `stop()` but I&#39;=
m still not sure yet if every Executor is called as such, which I am lookin=
g into now.</div><div><br></div><div>What I&#39;m curious to know is if any=
one has seen a similar issue?</div><span><font color=3D"#888888"><div><br>-=
- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div style=3D"color:rgb(1=
36,136,136);font-size:12.8000001907349px"><div dir=3D"ltr"><div dir=3D"ltr"=
><div dir=3D"ltr"><div><div><font color=3D"#ff9900" face=3D"arial, helvetic=
a, sans-serif" size=3D"1"><b>Richard Marscher</b></font></div><div><font fa=
ce=3D"arial, helvetica, sans-serif" size=3D"1">Software Engineer</font></di=
v><div dir=3D"ltr" style=3D"font-size:12.8000001907349px"><font face=3D"ari=
al, helvetica, sans-serif" size=3D"1">Localytics</font></div></div><a href=
=3D"http://localytics.com/" style=3D"color:rgb(17,85,204);font-family:arial=
,helvetica,sans-serif;font-size:x-small" target=3D"_blank">Localytics.com</=
a><span style=3D"font-family:arial,helvetica,sans-serif;font-size:x-small">=
=C2=A0|=C2=A0</span><a href=3D"http://localytics.com/blog" style=3D"color:r=
gb(17,85,204);font-family:arial,helvetica,sans-serif;font-size:x-small" tar=
get=3D"_blank">Our Blog</a><span style=3D"font-family:arial,helvetica,sans-=
serif;font-size:x-small">=C2=A0|=C2=A0</span><a href=3D"http://twitter.com/=
localytics" style=3D"color:rgb(17,85,204);font-family:arial,helvetica,sans-=
serif;font-size:x-small" target=3D"_blank">Twitter</a><span style=3D"font-f=
amily:arial,helvetica,sans-serif;font-size:x-small">=C2=A0|=C2=A0</span><a =
href=3D"http://facebook.com/localytics" style=3D"color:rgb(17,85,204);font-=
family:arial,helvetica,sans-serif;font-size:x-small" target=3D"_blank">Face=
book</a><span style=3D"font-family:arial,helvetica,sans-serif;font-size:x-s=
mall">=C2=A0|=C2=A0</span><a href=3D"http://www.linkedin.com/company/114879=
2?trk=3Dtyah" style=3D"color:rgb(17,85,204);font-family:arial,helvetica,san=
s-serif;font-size:x-small" target=3D"_blank">LinkedIn</a></div></div></div>=
</div></div></div></div></div>
</div></font></span></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div><div di=
r=3D"ltr"><div><div dir=3D"ltr"><div style=3D"color:rgb(136,136,136);font-s=
ize:12.8000001907349px"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr">=
<div><div><font color=3D"#ff9900" face=3D"arial, helvetica, sans-serif" siz=
e=3D"1"><b>Richard Marscher</b></font></div><div><font face=3D"arial, helve=
tica, sans-serif" size=3D"1">Software Engineer</font></div><div dir=3D"ltr"=
 style=3D"font-size:12.8000001907349px"><font face=3D"arial, helvetica, san=
s-serif" size=3D"1">Localytics</font></div></div><a href=3D"http://localyti=
cs.com/" style=3D"color:rgb(17,85,204);font-family:arial,helvetica,sans-ser=
if;font-size:x-small" target=3D"_blank">Localytics.com</a><span style=3D"fo=
nt-family:arial,helvetica,sans-serif;font-size:x-small">=C2=A0|=C2=A0</span=
><a href=3D"http://localytics.com/blog" style=3D"color:rgb(17,85,204);font-=
family:arial,helvetica,sans-serif;font-size:x-small" target=3D"_blank">Our =
Blog</a><span style=3D"font-family:arial,helvetica,sans-serif;font-size:x-s=
mall">=C2=A0|=C2=A0</span><a href=3D"http://twitter.com/localytics" style=
=3D"color:rgb(17,85,204);font-family:arial,helvetica,sans-serif;font-size:x=
-small" target=3D"_blank">Twitter</a><span style=3D"font-family:arial,helve=
tica,sans-serif;font-size:x-small">=C2=A0|=C2=A0</span><a href=3D"http://fa=
cebook.com/localytics" style=3D"color:rgb(17,85,204);font-family:arial,helv=
etica,sans-serif;font-size:x-small" target=3D"_blank">Facebook</a><span sty=
le=3D"font-family:arial,helvetica,sans-serif;font-size:x-small">=C2=A0|=C2=
=A0</span><a href=3D"http://www.linkedin.com/company/1148792?trk=3Dtyah" st=
yle=3D"color:rgb(17,85,204);font-family:arial,helvetica,sans-serif;font-siz=
e:x-small" target=3D"_blank">LinkedIn</a></div></div></div></div></div></di=
v></div></div>
</div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div><div dir=3D"ltr"><div><div dir=3D"ltr"><div style=3D"color:rgb(136,136=
,136);font-size:12.8000001907349px"><div dir=3D"ltr"><div dir=3D"ltr"><div =
dir=3D"ltr"><div><div><font color=3D"#ff9900" face=3D"arial, helvetica, san=
s-serif" size=3D"1"><b>Richard Marscher</b></font></div><div><font face=3D"=
arial, helvetica, sans-serif" size=3D"1">Software Engineer</font></div><div=
 dir=3D"ltr" style=3D"font-size:12.8000001907349px"><font face=3D"arial, he=
lvetica, sans-serif" size=3D"1">Localytics</font></div></div><a href=3D"htt=
p://localytics.com/" style=3D"color:rgb(17,85,204);font-family:arial,helvet=
ica,sans-serif;font-size:x-small" target=3D"_blank">Localytics.com</a><span=
 style=3D"font-family:arial,helvetica,sans-serif;font-size:x-small">=C2=A0|=
=C2=A0</span><a href=3D"http://localytics.com/blog" style=3D"color:rgb(17,8=
5,204);font-family:arial,helvetica,sans-serif;font-size:x-small" target=3D"=
_blank">Our Blog</a><span style=3D"font-family:arial,helvetica,sans-serif;f=
ont-size:x-small">=C2=A0|=C2=A0</span><a href=3D"http://twitter.com/localyt=
ics" style=3D"color:rgb(17,85,204);font-family:arial,helvetica,sans-serif;f=
ont-size:x-small" target=3D"_blank">Twitter</a><span style=3D"font-family:a=
rial,helvetica,sans-serif;font-size:x-small">=C2=A0|=C2=A0</span><a href=3D=
"http://facebook.com/localytics" style=3D"color:rgb(17,85,204);font-family:=
arial,helvetica,sans-serif;font-size:x-small" target=3D"_blank">Facebook</a=
><span style=3D"font-family:arial,helvetica,sans-serif;font-size:x-small">=
=C2=A0|=C2=A0</span><a href=3D"http://www.linkedin.com/company/1148792?trk=
=3Dtyah" style=3D"color:rgb(17,85,204);font-family:arial,helvetica,sans-ser=
if;font-size:x-small" target=3D"_blank">LinkedIn</a></div></div></div></div=
></div></div></div></div>
</div>
</div></div></blockquote></div><br></div>

--001a1140fb3e35d8ab05266c4e3e--