giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: Java Process Memory Leak
Date Mon, 17 Mar 2014 22:36:25 GMT
Interesting find.. It looks that bit was added recently (
https://reviews.apache.org/r/17644/diff/3/) and so was not part of Giraph
1.0.0 as far as I can tell.

Also, if anyone cares, a clunky (Ubuntu) workaround I'm using is: kill $(ps
aux | grep "[j]obcache/job_[0-9]\{12\}_[0-9]\{4\}/" | awk '{print $2}')

Thanks,
Young



On Mon, Mar 17, 2014 at 6:10 PM, Craig Muchinsky <cmuchins@us.ibm.com>wrote:

> I just noticed a similar problem myself. I did a thread dump and found
> similar netty client threads lingering. After poking around the source a
> bit, I'm wondering if the problem is related to this bit of code I found in
> the NettyClient.stop() method:
>
>             workerGroup.shutdownGracefully();
>             ProgressableUtils.*awaitTerminationFuture*(*executionGroup*,
> context);
>             *if* (executionGroup != *null*) {
>               executionGroup.shutdownGracefully();
>               ProgressableUtils.*awaitTerminationFuture*(executionGroup,
> context);
>             }
>
> Notice that the first await termination call seems to be waiting on the
> executionGroup instead of the workerGroup...
>
> Craig M.
>
>
>
> From:        Young Han <young.han@uwaterloo.ca>
> To:        user@giraph.apache.org
> Date:        03/17/2014 03:25 PM
> Subject:        Re: Java Process Memory Leak
> ------------------------------
>
>
>
> Oh, I see. I did jstack on a cluster of machines and a single machine...
> I'm not quite sure how to interpret the output. My best guess is that there
> might be a deadlock---there's just a bunch of Netty threads waiting. The
> links to the jstack dumps:
>
> *http://pastebin.com/0cLuaF07* <http://pastebin.com/0cLuaF07>
> (PageRank, single worker, amazon0505 graph from SNAP)
> *http://pastebin.com/MNEUELui* <http://pastebin.com/MNEUELui>   (MST,
> from one of the 64 workers, com-orkut graph from SNAP)
>
> Any idea what's happening? Or anything in particular I should look for
> next?
>
> Thanks,
> Young
>
>
> On Mon, Mar 17, 2014 at 12:19 PM, Avery Ching <*aching@apache.org*<aching@apache.org>>
> wrote:
> Hi Young,
>
> Our Hadoop instance (Corona) kills processes after they finish executing
> so we don't see this.  You might want to do a jstack to see where it's hung
> up on and figure out the issue.
>
> Thanks
>
> Avery
>
>
> On 3/17/14, 7:56 AM, Young Han wrote:
> Hi all,
>
> With Giraph 1.0.0, I've noticed an issue where the Java process
> corresponding to the job loiters around indefinitely even after the job
> completes (successfully). The process consumes memory but not CPU time.
> This happens on both a single machine and clusters of machines (in which
> case every worker has the issue). The only way I know of fixing this is
> killing the Java process manually---restarting or stopping Hadoop does not
> help.
>
> Is this some known bug or a configuration issue on my end?
>
> Thanks,
> Young
>
>
>

Mime
View raw message