spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shixiong Zhu <zsxw...@gmail.com>
Subject Re: Local Mode: Executor thread leak?
Date Wed, 09 Dec 2015 00:38:13 GMT
Could you send a PR to fix it? Thanks!

Best Regards,
Shixiong Zhu

2015-12-08 13:31 GMT-08:00 Richard Marscher <rmarscher@localytics.com>:

> Alright I was able to work through the problem.
>
> So the owning thread was one from the executor task launch worker, which
> at least in local mode runs the task and the related user code of the task.
> After judiciously naming every thread in the pools in the user code (with a
> custom ThreadFactory) I was able to trace down the leak to a couple thread
> pools that were not shut down properly by noticing the named threads
> accumulating in thread dumps of the JVM process.
>
> On Mon, Dec 7, 2015 at 6:41 PM, Richard Marscher <rmarscher@localytics.com
> > wrote:
>
>> Thanks for the response.
>>
>> The version is Spark 1.5.2.
>>
>> Some examples of the thread names:
>>
>> pool-1061-thread-1
>> pool-1059-thread-1
>> pool-1638-thread-1
>>
>> There become hundreds then thousands of these stranded in WAITING.
>>
>> I added logging to try to track the lifecycle of the thread pool in
>> Executor as mentioned before. Here is an excerpt, but every seems fine
>> there. Every executor that starts is also shut down and it seems like it
>> shuts down fine.
>>
>> 15/12/07 23:30:21 WARN o.a.s.e.Executor: Threads finished in executor
>> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@e5d036b[Terminated,
>> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
>> 15/12/07 23:30:28 WARN o.a.s.e.Executor: Executor driver created, thread
>> pool: java.util.concurrent.ThreadPoolExecutor@3bc41ae3[Running, pool
>> size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
>> 15/12/07 23:31:06 WARN o.a.s.e.Executor: Threads finished in executor
>> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@3bc41ae3[Terminated,
>> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 36]
>> 15/12/07 23:31:11 WARN o.a.s.e.Executor: Executor driver created, thread
>> pool: java.util.concurrent.ThreadPoolExecutor@6e85ece4[Running, pool
>> size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
>> 15/12/07 23:34:35 WARN o.a.s.e.Executor: Threads finished in executor
>> driver. pool shut down java.util.concurrent.ThreadPoolExecutor@6e85ece4[Terminated,
>> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 288]
>>
>> Also here is an example thread dump of such a thread:
>>
>> "pool-493-thread-1" prio=10 tid=0x00007f0e60612800 nid=0x18c4 waiting on
>> condition [0x00007f0c33c3e000]
>>    java.lang.Thread.State: WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for  <0x00007f10b3e8fb60> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>>         at
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> On Mon, Dec 7, 2015 at 6:23 PM, Shixiong Zhu <zsxwing@gmail.com> wrote:
>>
>>> Which version are you using? Could you post these thread names here?
>>>
>>> Best Regards,
>>> Shixiong Zhu
>>>
>>> 2015-12-07 14:30 GMT-08:00 Richard Marscher <rmarscher@localytics.com>:
>>>
>>>> Hi,
>>>>
>>>> I've been running benchmarks against Spark in local mode in a long
>>>> running process. I'm seeing threads leaking each time it runs a job. It
>>>> doesn't matter if I recycle SparkContext constantly or have 1 context stay
>>>> alive for the entire application lifetime.
>>>>
>>>> I see a huge accumulation ongoing of "pool-xxxx-thread-1" threads with
>>>> the creating thread "Executor task launch worker-xx" where x's are numbers.
>>>> The number of leaks per launch worker varies but usually 1 to a few.
>>>>
>>>> Searching the Spark code the pool is created in the Executor class. It
>>>> is `.shutdown()` in the stop for the executor. I've wired up logging and
>>>> also tried shutdownNow() and awaitForTermination on the pools. Every seems
>>>> okay there for every Executor that is called with `stop()` but I'm still
>>>> not sure yet if every Executor is called as such, which I am looking into
>>>> now.
>>>>
>>>> What I'm curious to know is if anyone has seen a similar issue?
>>>>
>>>> --
>>>> *Richard Marscher*
>>>> Software Engineer
>>>> Localytics
>>>> Localytics.com <http://localytics.com/> | Our Blog
>>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics>
>>>>  | Facebook <http://facebook.com/localytics> | LinkedIn
>>>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>>>
>>>
>>>
>>
>>
>> --
>> *Richard Marscher*
>> Software Engineer
>> Localytics
>> Localytics.com <http://localytics.com/> | Our Blog
>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics>
|
>> Facebook <http://facebook.com/localytics> | LinkedIn
>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>
>
>
>
> --
> *Richard Marscher*
> Software Engineer
> Localytics
> Localytics.com <http://localytics.com/> | Our Blog
> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
> Facebook <http://facebook.com/localytics> | LinkedIn
> <http://www.linkedin.com/company/1148792?trk=tyah>
>

Mime
View raw message