hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seraph Imalia <ser...@eisp.co.za>
Subject RAM Problems - Keeps Crashing
Date Wed, 28 Dec 2011 14:27:25 GMT
Hi Guys,

After updating from 0.20.6 to 0.90.4, we have been having serious RAM issues.  I had hbase-env.sh
set to use 3 Gigs of RAM with 0.20.6 but with 0.90.4 even 4.5 Gigs seems not enough.  It does
not matter how much load the hbase services are under, it just crashes after 24-48 hours.
 The only difference the load makes is how quickly the services crash.  Even over this holiday
season with our lowest load of the year, it crashes just after 36 hours of being started.
 To fix it, I have to run the stop-hbase.sh command, wait a while and kill -9 any hbase processes
that have stopped outputting logs or stopped responding, and then run start-hbase.sh again.

Attached are my logs from the latest "start-to-crash".  There are 3 servers and hbase is being
used for storing URL's - 7 client servers connect to hbase and perform URL Lookups at about
40 requests per second (this is the low load over this holiday season).  If the URL does not
exist, it gets added.  The Key on the HTable is the URL and there are a few fields stored
against it - e.g. DateDiscovered, Host, Script, QueryString, etc.

Each server has a hadoop datanode and an hbase regionserver and 1 of the servers additionally
has the namenode, master and zookeeper.  On first start, each regionserver uses 2 Gigs (usedHeap)
and as soon as I restart the clients, the usedHeap slowly climes until it reaches the maxHeap
and shortly after that, the regionservers start crashing - sometimes they actually shutdown
gracefully by themselves.

Originally, we had hbase.regionserver.handler.count set to 100 and I have now removed that
to leave it as default which has not helped.

We have not made any changes to the clients and we have a mirrored instance of this in our
UK Data Centre which is still running 0.20.6 and servicing 10 clients currently at over 300
requests per second (again low load over the holidays) and it is 100% stable.

What do I do now? - your website says I cannot downgrade?

Please help


View raw message