Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E2E0E18475 for ; Wed, 9 Dec 2015 00:43:30 +0000 (UTC) Received: (qmail 20173 invoked by uid 500); 9 Dec 2015 00:43:27 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 20054 invoked by uid 500); 9 Dec 2015 00:43:26 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 20044 invoked by uid 99); 9 Dec 2015 00:43:26 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2015 00:43:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 5F562C0E02 for ; Wed, 9 Dec 2015 00:43:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id yci1veaQR_o8 for ; Wed, 9 Dec 2015 00:43:14 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 71E5442ABE for ; Wed, 9 Dec 2015 00:43:14 +0000 (UTC) Received: by iouu10 with SMTP id u10so43586835iou.0 for ; Tue, 08 Dec 2015 16:38:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=KP9zLMqJKFbtm5Ee9D3wXfMHKe5JUVbf9N5Lfcxy1OA=; b=ALt9GFU+anRbOZyJqSrpqKLvG2VrvlWK1ThzWkvxPbC8AHIq+SNs8rC+2wijMItXRe zuds0ABTmhOrm6JcTz6yioSCRLRowDuDI6rJClflGukUSaCydxL+nF+21aDlB1zHmM3b k+kbn2IK8KePl6qKtICx5To/4ENn9Azf1c10px1rWgg98CC1Q1h+IACCHGSBiSpcBNOk HmKzbaNm4i3ipqoDBX4QqMG4UwfHk4rKTVVoCS7U9CVqeaDOhz7oTr7HA/n8rR4CU1ne bmzsH4XoT3iuKe55QGhqwubD7hsx0iE0kehJbvNS3EprBlrixlbmo/AQ8yvdChJi20eT azEA== MIME-Version: 1.0 X-Received: by 10.107.159.199 with SMTP id i190mr2573715ioe.29.1449621494026; Tue, 08 Dec 2015 16:38:14 -0800 (PST) Received: by 10.36.43.14 with HTTP; Tue, 8 Dec 2015 16:38:13 -0800 (PST) In-Reply-To: References: Date: Tue, 8 Dec 2015 16:38:13 -0800 Message-ID: Subject: Re: Local Mode: Executor thread leak? From: Shixiong Zhu To: Richard Marscher Cc: user Content-Type: multipart/alternative; boundary=001a1140fb3e35d8ab05266c4e3e --001a1140fb3e35d8ab05266c4e3e Content-Type: text/plain; charset=UTF-8 Could you send a PR to fix it? Thanks! Best Regards, Shixiong Zhu 2015-12-08 13:31 GMT-08:00 Richard Marscher : > Alright I was able to work through the problem. > > So the owning thread was one from the executor task launch worker, which > at least in local mode runs the task and the related user code of the task. > After judiciously naming every thread in the pools in the user code (with a > custom ThreadFactory) I was able to trace down the leak to a couple thread > pools that were not shut down properly by noticing the named threads > accumulating in thread dumps of the JVM process. > > On Mon, Dec 7, 2015 at 6:41 PM, Richard Marscher > wrote: > >> Thanks for the response. >> >> The version is Spark 1.5.2. >> >> Some examples of the thread names: >> >> pool-1061-thread-1 >> pool-1059-thread-1 >> pool-1638-thread-1 >> >> There become hundreds then thousands of these stranded in WAITING. >> >> I added logging to try to track the lifecycle of the thread pool in >> Executor as mentioned before. Here is an excerpt, but every seems fine >> there. Every executor that starts is also shut down and it seems like it >> shuts down fine. >> >> 15/12/07 23:30:21 WARN o.a.s.e.Executor: Threads finished in executor >> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@e5d036b[Terminated, >> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1] >> 15/12/07 23:30:28 WARN o.a.s.e.Executor: Executor driver created, thread >> pool: java.util.concurrent.ThreadPoolExecutor@3bc41ae3[Running, pool >> size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] >> 15/12/07 23:31:06 WARN o.a.s.e.Executor: Threads finished in executor >> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@3bc41ae3[Terminated, >> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 36] >> 15/12/07 23:31:11 WARN o.a.s.e.Executor: Executor driver created, thread >> pool: java.util.concurrent.ThreadPoolExecutor@6e85ece4[Running, pool >> size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] >> 15/12/07 23:34:35 WARN o.a.s.e.Executor: Threads finished in executor >> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@6e85ece4[Terminated, >> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 288] >> >> Also here is an example thread dump of such a thread: >> >> "pool-493-thread-1" prio=10 tid=0x00007f0e60612800 nid=0x18c4 waiting on >> condition [0x00007f0c33c3e000] >> java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for <0x00007f10b3e8fb60> (a >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) >> at >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) >> at >> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> On Mon, Dec 7, 2015 at 6:23 PM, Shixiong Zhu wrote: >> >>> Which version are you using? Could you post these thread names here? >>> >>> Best Regards, >>> Shixiong Zhu >>> >>> 2015-12-07 14:30 GMT-08:00 Richard Marscher : >>> >>>> Hi, >>>> >>>> I've been running benchmarks against Spark in local mode in a long >>>> running process. I'm seeing threads leaking each time it runs a job. It >>>> doesn't matter if I recycle SparkContext constantly or have 1 context stay >>>> alive for the entire application lifetime. >>>> >>>> I see a huge accumulation ongoing of "pool-xxxx-thread-1" threads with >>>> the creating thread "Executor task launch worker-xx" where x's are numbers. >>>> The number of leaks per launch worker varies but usually 1 to a few. >>>> >>>> Searching the Spark code the pool is created in the Executor class. It >>>> is `.shutdown()` in the stop for the executor. I've wired up logging and >>>> also tried shutdownNow() and awaitForTermination on the pools. Every seems >>>> okay there for every Executor that is called with `stop()` but I'm still >>>> not sure yet if every Executor is called as such, which I am looking into >>>> now. >>>> >>>> What I'm curious to know is if anyone has seen a similar issue? >>>> >>>> -- >>>> *Richard Marscher* >>>> Software Engineer >>>> Localytics >>>> Localytics.com | Our Blog >>>> | Twitter >>>> | Facebook | LinkedIn >>>> >>>> >>> >>> >> >> >> -- >> *Richard Marscher* >> Software Engineer >> Localytics >> Localytics.com | Our Blog >> | Twitter | >> Facebook | LinkedIn >> >> > > > > -- > *Richard Marscher* > Software Engineer > Localytics > Localytics.com | Our Blog > | Twitter | > Facebook | LinkedIn > > --001a1140fb3e35d8ab05266c4e3e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Could you send a PR to fix it? Thanks!

=

Best Regards,

Shixiong Zhu

2015-12-08 13:31 GMT-08:00 Richard Marscher = <rmarscher@localytics.com>:
Alright I was able to work through the problem.
So the owning thread was one from the executor task launc= h worker, which at least in local mode runs the task and the related user c= ode of the task. After judiciously naming every thread in the pools in the = user code (with a custom ThreadFactory) I was able to trace down the leak t= o a couple thread pools that were not shut down properly by noticing the na= med threads accumulating in thread dumps of the JVM process.

On Mon, Dec 7, 2015 at 6:41 PM, Richard Marscher <rmarscher@localytics.com> wrote:
Thanks for the response.

The v= ersion is Spark 1.5.2.

Some examples of the thread= names:

pool-1061-thread-1
pool-1059-thr= ead-1
pool-1638-thread-1

There become hu= ndreds then thousands of these stranded in WAITING.

I added logging to try to track the lifecycle of the thread pool in Execu= tor as mentioned before. Here is an excerpt, but every seems fine there. Ev= ery executor that starts is also shut down and it seems like it shuts down = fine.

15/= 12/07 23:30:21 WARN o.a.s.e.Executor: Threads finished in executor driver. = pool shut down java.util.concurrent.ThreadPoolExecutor@e5d036b[Terminated, = pool size =3D 0, active threads =3D 0, queued tasks =3D 0, completed tasks = =3D 1]
15/12/07 23:30:= 28 WARN o.a.s.e.Executor: Executor driver created, thread pool: java.util.c= oncurrent.ThreadPoolExecutor@3bc41ae3[Running, pool size =3D 0, active thre= ads =3D 0, queued tasks =3D 0, completed tasks =3D 0]
15/12/07 23:31:06 WARN o.a.s.e.Executor: Th= reads finished in executor driver. pool shut down java.util.concurrent.Thre= adPoolExecutor@3bc41ae3[Terminated, pool size =3D 0, active threads =3D 0, = queued tasks =3D 0, completed tasks =3D 36]
15/12/07 23:31:11 WARN o.a.s.e.Executor: Executor dri= ver created, thread pool: java.util.concurrent.ThreadPoolExecutor@6e85ece4[= Running, pool size =3D 0, active threads =3D 0, queued tasks =3D 0, complet= ed tasks =3D 0]
15/12/= 07 23:34:35 WARN o.a.s.e.Executor: Threads finished in executor driver. poo= l shut down java.util.concurrent.ThreadPoolExecutor@6e85ece4[Terminated, po= ol size =3D 0, active threads =3D 0, queued tasks =3D 0, completed tasks = =3D 288]

Also here is an example thre= ad dump of such a thread:

&= quot;pool-493-thread-1" prio=3D10 tid=3D0x00007f0e60612800 nid=3D0x18c= 4 waiting on condition [0x00007f0c33c3e000]
=C2=A0 =C2=A0java.lang.Thread.State: WAITING (parking= )
=C2=A0 =C2=A0 =C2=A0= =C2=A0 at sun.misc.Unsafe.park(Native Method)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 - parking to wait for= =C2=A0<0x00007f10b3e8fb60> (a java.util.concurrent.locks.AbstractQue= uedSynchronizer$ConditionObject)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.locks.LockSu= pport.park(LockSupport.java:186)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.locks.Abstra= ctQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:= 2043)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlocki= ngQueue.java:442)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.getTask= (ThreadPoolExecutor.java:1068)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExec= utor.runWorker(ThreadPoolExecutor.java:1130)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.= ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.la= ng.Thread.run(Thread.java:745)

On Mon, Dec 7, 2015 at 6:23= PM, Shixiong Zhu <zsxwing@gmail.com> wrote:
Which version are you using? Could you = post these thread names here?

Best Regards,

Shixiong Zhu

2015-12-07 14:30 GMT-08:00 Richard Marscher = <rmarscher@localytics.com>:
Hi,

I've been running benc= hmarks against Spark in local mode in a long running process. I'm seein= g threads leaking each time it runs a job. It doesn't matter if I recyc= le SparkContext constantly or have 1 context stay alive for the entire appl= ication lifetime.

I see a huge accumulation ongoin= g of "pool-xxxx-thread-1" threads with the creating thread "= Executor task launch worker-xx" where x's are numbers. The number = of leaks per launch worker varies but usually 1 to a few.

Searching the Spark code the pool is created in the Executor class.= It is `.shutdown()` in the stop for the executor. I've wired up loggin= g and also tried shutdownNow() and awaitForTermination on the pools. Every = seems okay there for every Executor that is called with `stop()` but I'= m still not sure yet if every Executor is called as such, which I am lookin= g into now.

What I'm curious to know is if any= one has seen a similar issue?

-= -
Richard Marscher
Software Engineer
Localytics
Localytics.com= =C2=A0|=C2=A0Our Blog=C2=A0|=C2=A0Twitter=C2=A0|=C2=A0Face= book=C2=A0|=C2=A0LinkedIn
=




--
=
Richard Marscher
Software Engineer
Localytics
Localytics.com=C2=A0|=C2=A0Our = Blog=C2=A0|=C2=A0Twitter=C2=A0|=C2=A0Facebook=C2=A0|=C2= =A0LinkedIn



--
=
Richard Marscher
Software Engineer
Localytics
Localytics.com=C2=A0|= =C2=A0Our Blog=C2=A0|=C2=A0Twitter=C2=A0|=C2=A0Facebook= =C2=A0|=C2=A0LinkedIn

--001a1140fb3e35d8ab05266c4e3e--