hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: What to do/check/debug/root cause analysis when jobtracker hang
Date Thu, 07 Feb 2013 02:12:25 GMT

Our cluster on cdh3u4 has the same problem. I think it is caused by some bugs in JobTracker.
I believe Cloudera knows about this issue.
After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure if it is confirmed
to fix in the CDH3U5.
Yong

> Date: Mon, 4 Feb 2013 15:21:18 -0800
> Subject: What to do/check/debug/root cause analysis when jobtracker hang
> From: silvianhadoop@gmail.com
> To: user@hadoop.apache.org
> 
> Lately, jobtracker in one of our production cluster fall into hang state.
> The load 5,10,15min is like 1 ish;
> with top command, jobtracker has 100% cpu all the time.
> 
> So, i went ahead to try top -H -p jobtracker_pid, and always see a
> thread that have 100% cpu all the time.
> 
> Unless we restart jobtracker, the hang state would never go away.
> 
> I found OOM in jobtracker log file during the hang state.
> 
> how could i know what is really going on on the one and only one
> thread that has 100% cpu.
> 
> how could i prove that we run out of memory because amount of job
> _OR_
> there is memory leak in application side. ?
> 
> 
> I tried jstack to dump, and http://jobtracker:50030/stacks
> 
> i just don't know what I should really look at output of those commands.
> 
> The cluster is cdh3u4, on Centos6.2, with disable transparent_hugepage.
> 
> 
> 
> hopefully this make sense,
> -P
 		 	   		  
Mime
View raw message