hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: NameNode deadlocked (help?)
Date Mon, 17 May 2010 16:42:33 GMT
Brian Bockelman wrote:
> On May 17, 2010, at 5:25 AM, Steve Loughran wrote:
>> Brian Bockelman wrote:
>>> On May 14, 2010, at 8:27 PM, Todd Lipcon wrote:
>>>> Hey Brian,
>>>> Yep, excessive GC definitely sounds like a likely culprit. I'm surprised
>>>> didn't see OOMEs in the log, though.
>>> We didn't until the third restart today.  I have no clue why we haven't seen
this in the past 9 months of this cluster though...
>>> Anyhow, it looks like this might have done the trick... the sysadmin is heading
over to kick over a few errant datanodes, and we should be able to get out of safemode soon.
 Luckily, it's a 4-day weekend in Europe and otherwise a Friday evening in the US, so there's
only a few folks using it.
>> good thing we europeans have long weekends.
> :) Indeed
>>>> If you want to monitor GC, I'd recommend adding -verbose:gc
>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps to your java options -
>>>> occasionally useful for times like this.
>> What are your current GC options? Played with compressed object pointers yet?
> I've been eyeballing them, but haven't had any chance yet.  We probably won't mess with
them until we start to run out of RAM on the machine themselves.
> This particular instance was a simple oversight - there was no need to try and fit the
NN into a 1GB heap on a dedicated machine.  I tell folks 1GB RAM per 1M objects.  It's almost
always an over-estimate but, better safe than deadlocked on a Friday evening...

I've been using compressed pointers on JRockit for a long time, a very 
nice JVM that doesn't ever seem to run out of stack when you 
accidentally tail recurse without end. The Sun JVM pointers are newer, 
not had any problems with that part of the JVM, and the benefits in both 
memory consumption and possibly in cache hits make it very appealing.

View raw message