hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: regionserver disconnection
Date Tue, 01 Dec 2009 05:22:33 GMT
What pages do you have that I could look at re the gc tuning you mention?

Having a concrete set of GC issues/details that we can point people to 
would be very helpful IMO - both to help users but also to highlight the 
issue from the perspective that Sun really needs to do a better job 
here. If we are pushing the boundaries then perhaps Sun might be 
interested to work more closely with us (pipe dream?). My experience 
talking with non-java folks these days (c/c++, usually where I justify 
ZK in java) is that the poor GC performance is one of the few argument 
strongholds they have left. I'd love to help eliminate that, but today 
they have a point.

I will put you in contact with someone on the HDFS team re NN gc.

Patrick

stack wrote:
> I suppose up to this I thought it a given for any java application that
> wants to do realtime whether a webserver or search application but yeah, we
> should do more to highlight the import of GC tuning especially when failure
> to do so can be relatively catastrophic (A RegionServer self-shutting itself
> down).  Ryan in particular has been doing a bunch of talking up of the topic
> (He did our performance tuning wiki page too).   We could start up a list of
> use cases and the tunings that helped alleviate GC woes for a particular
> cluster profile and loading (So we'd have something to present at BAHUG?  Do
> you know who we might talk to regards pauses in the MR/HDFS team Patrick?
> We were introduced to the NameNode Tuner once... we should talk to him
> again).  It does seem to be a problem where one tuning does not suit all
> deploys.
> 
> Regards Zhenyu's case, there is still work to do IMO.  What I saw in his
> logs was a failed promotion from parnew, something that could be helped
> starting CMS collection earlier (among other things).  Hes also still on an
> older version of the JVM.   While things are not timing out at the moment,
> IMO its still 'broke' if it has such long pauses (Zhenyu, in your GC logs,
> are you seeing 4 minutes pause?).  Ryan would argue these are inevitable
> with CMS -- but at least in the one case that I saw some twiddling would
> seem to help.
> 
> Thanks Patrick,
> St.Ack
> 

Mime
View raw message