hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: What to do/check/debug/root cause analysis when jobtracker hang
Date Thu, 07 Feb 2013 02:12:25 GMT

Our cluster on cdh3u4 has the same problem. I think it is caused by some bugs in JobTracker.
I believe Cloudera knows about this issue.
After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure if it is confirmed
to fix in the CDH3U5.

> Date: Mon, 4 Feb 2013 15:21:18 -0800
> Subject: What to do/check/debug/root cause analysis when jobtracker hang
> From: silvianhadoop@gmail.com
> To: user@hadoop.apache.org
> Lately, jobtracker in one of our production cluster fall into hang state.
> The load 5,10,15min is like 1 ish;
> with top command, jobtracker has 100% cpu all the time.
> So, i went ahead to try top -H -p jobtracker_pid, and always see a
> thread that have 100% cpu all the time.
> Unless we restart jobtracker, the hang state would never go away.
> I found OOM in jobtracker log file during the hang state.
> how could i know what is really going on on the one and only one
> thread that has 100% cpu.
> how could i prove that we run out of memory because amount of job
> _OR_
> there is memory leak in application side. ?
> I tried jstack to dump, and http://jobtracker:50030/stacks
> i just don't know what I should really look at output of those commands.
> The cluster is cdh3u4, on Centos6.2, with disable transparent_hugepage.
> hopefully this make sense,
> -P
View raw message