hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Memory leak in HBase replication ?
Date Wed, 17 Jul 2013 16:33:31 GMT
Those puts should get cleared right away, so it could mean that they
live in memory... which usually points to very full IPC queues. If you
jstack those region servers, are all the handlers thread full? What is
the log before it starts doing full GCs? Can we see it?



On Wed, Jul 17, 2013 at 9:06 AM, Anusauskas, Laimonas
<LAnusauskas@corp.untd.com> wrote:
> Hi,
> I am fairly new to Hbase. We are trying to setup OpenTSDB system here and just started
setting up production clusters. We have 2 datacenters, on a west/east coasts and we want to
have 2 active-passive Hbase clusters with Hbase replication between them. Right now each cluster
has 4 nodes (1 master, 3 slave), we will add more nodes as the load ramps up.  Setup went
fine and data started getting replicating from one cluster to another, but as soon as load
picked up regionservers on slave cluster started running out of heap and getting killed. I
increased heap size on regionservers from default 1000M to 2000M, but result was the same.
I also updated Hbase from the version that came with Hortonworks (hbase-
to hbase-0.94.9 - still the same.
> Now the load on source cluster is still very little. There is one active table - tsdb,
and compressed size is less than 200M. But as soon as I start replication the usedHeapMB metric
on regionservers in slave cluster starts going up, then full GC kicks in and eventually process
is killed because  "-XX:OnOutOfMemoryError=kill -9 %p" is set.
> I did the heap dump and ran Eclipse memory analyzer and here is what it reported:
> One instance of "java.util.concurrent.LinkedBlockingQueue" loaded by "<system class
loader>" occupies 1,411,643,656 (67.87%) bytes. The instance is referenced by org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server
@ 0x7831c37f0 , loaded by "sun.misc.Launcher$AppClassLoader @ 0x783130980". The memory is
accumulated in one instance of "java.util.concurrent.LinkedBlockingQueue$Node" loaded by "<system
class loader>".
> And
> 502,763 instances of "org.apache.hadoop.hbase.client.Put", loaded by "sun.misc.Launcher$AppClassLoader
@ 0x783130980" occupy 244,957,616 (11.78%) bytes.
> There is nothing in the logs until full GC kicks in at which point all hell breaks loose,
things start timing out etc.
> I did bunch of searching but came up with nothing. I could add more RAM to the nodes
and increase heap size, but I suspect that will only prolong the time until heap gets full.
> Any help would be appreciated.
> Limus

View raw message