hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: web interface is fragile?
Date Thu, 01 Apr 2010 01:26:35 GMT
Hi J-D,
Thanks for taking a look at this.  The error that I received is:
http://pastebin.com/ZnhVA5B0
This is the client side.
I little strange as I have been running this task several times in the past, and my client
heap size is set to 4GB.  I can try doubling it and see if that helps
Dave


-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Wednesday, March 31, 2010 6:11 PM
To: hbase-user@hadoop.apache.org
Subject: Re: web interface is fragile?

Dave,

Can you pastebin the exact error that was returned by the MR job? That
looks like it's client-side (from HBase point of view).

WRT the .META. and the master, the web page does do a request on every
hit so if the region is unavailable then you can't see it. Looks like
you kill -9'ed the region server? If so, it takes a minute to detect
the region server failure and then split the write-ahead-logs so if
.META. was on that machine, it will take that much time to have a
working web page.

Instead of kill -9, simply go on the node and run
./bin/hbase-daemon.sh stop regionserver

J-D

On Wed, Mar 31, 2010 at 5:51 PM, Buttler, David <buttler1@llnl.gov> wrote:
> Hi,
> I have a small cluster (6 nodes, 1 master and 5 region server/data nodes).  Each node
has lots of memory and disk (16GB of heap dedicated to RegionServers), 4 TB of disk per node
for hdfs.
> I have a table with about 1 million rows in hbase - that's all.  Currently it is split
across 50 regions.
> I was monitoring this with the hbase web gui and I noticed that a lot of the heap was
being used (14GB).  I was running a MR job and I was getting an error to the console that
launched the job:
> Error: GC overhead limit exceeded hbase
>
> First question: is this going to hose the whole system?  I didn't see the error in any
of the hbase logs, so I assume that it was purely a client issue.
>
> So, naively thinking that maybe the GC had moved everything to permgen and just wasn't
cleaning up, I thought I would do a rolling restart of my region servers and see if that cleared
everything up.  The first server I killed happened to be the one that was hosting the .META.
table.  Subsequently the web gui failed.  Looking at the errors, it seems that the web gui
essentially caches the address for the meta table and blindly tries connecting on every request.
 I suppose I could restart the master, but this does not seem like desirable behavior.  Shouldn't
the cache be refreshed on error?  And since there is no real code for the GUI, just a jsp
page, doesn't this mean that this behavior could be seen in other applications that use HMaster?
>
> Corrections welcome
> Dave
>
>

Mime
View raw message