hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: dilemma of memory and CPU for hbase.
Date Thu, 01 Jul 2010 21:17:44 GMT
(taking the conversation back to the list after receiving logs and heap dump)

The issue here is actually much more nasty than it seems. But before I
describe the problem, you said:

>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 8
> regionservers.

If those are all distinct machines, you are wasting a lot of hardware.
Unless you have a HA Namenode (I highly doubt), then you already have
a SPOF there so you might as well put every service on that single
node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
node, but unless you share the zookeeper ensemble between clusters
then losing the Namenode is as bad as losing ZK so might as well put
them together. At StumbleUpon we have 2-3 clusters using the same
ensembles, so it makes more sense to put them in a HA setup.

That said, in your log I see:

2010-06-29 00:00:00,064 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
interrupted at index=0 because:Requested row out of range for HRegion
Spam_MsgEventTable,2010-06-28 11:34:02blah
...
2010-06-29 12:26:13,352 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
interrupted at index=0 because:Requested row out of range for HRegion
Spam_MsgEventTable,2010-06-28 11:34:02blah

So for 12 hours (and probably more), the same row was requested almost
every 100ms but it was always failing on a WrongRegionException
(that's the name of what we see here). You probably use the write
buffer since you want to import as fast as possible, so all these
buffers are left unused after the clients terminate their RPC. That
rate of failed insertion must have kept your garbage collector _very_
busy, and at some point the JVM OOMEd. This is the stack from your
OOME:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)

This is where we deserialize client data, so it correlates with what I
just described.

Now, this means that you probably have a hole (or more) in your .META.
table. It usually happens after a region server fails if it was
carrying it (since data loss is possible with that version of HDFS) or
if a bug in the master messes up the .META. region. Now 2 things:

 - It would be nice to know why you have a hole. Look at your .META.
table around the row in your region server log, you should see that
the start/end keys don't match. Then you can look in the master log
from yesterday to search for what went wrong, maybe see some
exceptions, or maybe a region server failed for any reason and it was
hosting .META.

 - You probably want to fix your table. Use the bin/add_table.rb
script (other people on this list used it in the past, search the
archive for more info).

Finally (whew!), if you are still developing your solution around
HBase, you might want to try out one of our dev release that does work
with a durable Hadoop release. See
http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
CDH3b2 also has everything you need.

J-D

On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> 653 regions is very low, even if you had a total of 3 region servers I
> wouldn't expect any problem.
>
> So to me it seems to point towards either a configuration issue or a
> usage issue. Can you:
>
>  - Put the log of one region server that OOMEd on a public server.
>  - Tell us more about your setup: # of nodes, hardware, configuration file
>  - Tell us more about how you insert data into HBase
>
> And BTW are you trying to do an initial import of your data set? If
> so, have you considered using HFileOutputFormat?
>
> Thx,
>
> J-D
>
> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <jinsong_hu@hotmail.com> wrote:
>> Hi, Sir:
>>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
>> server running out of memory.
>> the regionserver is given 6G memory each, and on average, I have 653 regions
>> in total. max store size
>> is 256M. I analyzed the dump and it shows that there are too many HRegion in
>> memory.
>>
>>  Previously set max store size to 2G, but then I found the region server
>> constantly does minor compaction and the CPU usage is very high, It also
>> blocks the heavy client record insertion.
>>
>>  So now I am limited on one side by memory,  limited on another size by CPU.
>> Is there anyway to get out of this dilemma ?
>>
>> Jimmy.
>>
>

Mime
View raw message