spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Re: Very long pause/hang at end of execution
Date Wed, 16 Nov 2016 10:44:04 GMT
Thanks for sharing the thread dump. I had a look at them and couldn't find
anything unusual. Is there anything in the logs (driver + executor) that
suggests what's going on? Also, what does the spark job do and what is the
version of spark and hadoop you are using?

Thanks,
Aniket

On Wed, Nov 16, 2016 at 2:07 AM Michael Johnson <mjjohnson.geo@yahoo.com>
wrote:

> The extremely long hand/pause has started happening again. I've been
> running on a small remote cluster, so I used the UI to grab thread dumps
> rather than doing it from the command line. There seems to be one executor
> still alive, along with the driver; I grabbed 4 thread dumps from each, a
> couple of seconds apart. I'd greatly appreciate any help tracking down
> what's going on! (I've attached them, but I can paste them somewhere if
> that's more convenient.)
>
> Thanks,
> Michael
>
>
>
>
> On Sunday, November 6, 2016 10:49 PM, Michael Johnson
> <mjjohnson.geo@yahoo.com.INVALID> wrote:
>
>
> Hm. Something must have changed, as it was happening quite consistently
> and now I can't get it to reproduce. Thank you for the offer, and if it
> happens again I will try grabbing thread dumps and I will see if I can
> figure out what is going on.
>
>
> On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
>
> I doubt it's GC as you mentioned that the pause is several minutes. Since
> it's reproducible in local mode, can you run the spark application locally
> and once your job is complete (and application appears paused), can you
> take 5 thread dumps (using jstack or jcmd on the local spark JVM process)
> with 1 second delay between each dump and attach them? I can take a look.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson <mjjohnson.geo@yahoo.com>
> wrote:
>
> Thanks; I tried looking at the thread dumps for the driver and the one
> executor that had that option in the UI, but I'm afraid I don't know how to
> interpret what I saw...  I don't think it could be my code directly, since
> at this point my code has all completed? Could GC be taking that long?
>
> (I could also try grabbing the thread dumps and pasting them here, if that
> would help?)
>
> On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
> <mjjohnson.geo@yahoo.com.invalid> wrote:
>
> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>
>
>
>
>
>
>
>

Mime
View raw message