hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bobby Dennett" <softw...@bobby.fastmail.us>
Subject RE: Hadoop JobTracker Hanging
Date Mon, 21 Jun 2010 19:49:33 GMT
Thanks all for your suggestions (please note that Tan is my co-worker;
we are both working to try and resolve this issue)... we experienced
another hang this weekend and increased the HADOOP_HEAPSIZE setting to
6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java
heap space" errors in the jobtracker log. We are now looking into the
resource allocation of the master node/server to ensure we aren't
experiencing any issues due to the heap size increase. In parallel, we
are also working on building "beefier" servers -- stronger CPUs, 3x more
memory -- for the node running the primary namenode and jobtracker
processes as well as for the secondary namenode.

Any additional suggestions you might have for troubleshooting/resolving
this hanging jobtracker issue would be greatly appreciated.

Please note that I had previously started a similar topic on Get
Satisfaction
(http://www.getsatisfaction.com/cloudera/topics/looking_for_troubleshooting_tips_guidance_for_hanging_jobtracker)
where Todd is helping and the output of jstack and jmap can be found.

Thanks,
-Bobby

On Fri, 18 Jun 2010 15:04 -0600, "Li, Tan" <tali@shopping.com> wrote:
> Todd,
> I will try to increase the HADOOP_HEAPSIZE to see if that helps.
> Tan
> 
> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com] 
> Sent: Thursday, June 17, 2010 5:07 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop JobTracker Hanging
> 
> Li, just to narrow your search, in my experience this is usually caused
> by
> OOME on the JT. Check the logs for OutOfMemoryException, see what you
> find.
> You may need to configure it to retain fewer jobs in memory, or up your
> heap.
> 
> -Todd
> 
> On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <tali@shopping.com> wrote:
> 
> > Thanks for your tips, Ted.
> > All of our QA is done on 0.20.1, and I got a feeling it is not version
> > related.
> > I will run jstack and jmap once the problem happens again and I may need
> > your help to analyze the result.
> >
> > Tan
> >
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Thursday, June 17, 2010 2:39 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Hadoop JobTracker Hanging
> >
> > Is upgrading to hadoop-0.20.2+228 possible ?
> >
> > Use jstack to get stack trace of job tracker process when this happens
> > again.
> > Use jmap to get shared object memory maps or heap memory details.
> >
> > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <tali@shopping.com> wrote:
> >
> > > Folks,
> > >
> > > I need some help on job tracker.
> > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is
> > with
> > > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68
> > > (Cloudera).
> > >
> > > I have the same problem with both the clusters: the job tracker hangs
> > > almost once a day.
> > > Symptom: The job tracker web page can not be loaded, the command "hadoop
> > > job -list" hangs and jobtracker.log file stops being updated.
> > > No useful information can I find in the job tracker log file.
> > > The symptom is gone after I restart the job tracker and the cluster runs
> > > fine for another 20+ hour period. And then the symptom comes back.
> > >
> > > I do not have serious problem with HDFS.
> > >
> > > Any ideas about the causes? Any configuration parameter that I can change
> > > to reduce the chances of the problem?
> > > Any tips for diagnosing and troubleshooting?
> > >
> > > Thanks!
> > >
> > > Tan
> > >
> > >
> > >
> > >
> >
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera
> 

Mime
View raw message