hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: load balancing considerations
Date Wed, 11 Aug 2010 04:38:53 GMT
Ted:

You have 22 column families in your schema?  Do you need that many?
Run with less if you can because 22 CFs takes you into a category that
not many hang out in.  It may be at the root of the OOME.

Otherwise, its the usual suspects -- a bad record perhaps?  One that
was incorrectly formatted so it had a very large size on it?

Do you run w/ GC enabled?  If not, try it.  Apparently its near to
frictionless.  It might give us more clues.

Also, when the RS crashes, it'll dump heap by default.  Do you see it?
 If you put it someplace that I can pull, I'll take a look at it.

St.Ack

On Tue, Aug 10, 2010 at 9:30 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> We use 0.20.6 with HBASE-2473
> As you can see from the following region server log snippet, OOME happened
> to this RS:
>
> 2010-08-11 03:59:12,760 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 17 on 60020' on region
> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128:
> memstore size 1.0g is >= than blocking 1.0g size
> 2010-08-11 03:59:16,853 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 24 on 60020' on region
> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128:
> memstore size 1.0g is >= than blocking 1.0g size
> 2010-08-11 03:59:44,524 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
> aborting.
> java.lang.OutOfMemoryError: Java heap space
>        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:825)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
> 2010-08-11 03:59:44,525 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=9, stores=22, storefiles=4, storefileIndexSize=5,
> memstoreSize=1502, compactionQueueSize=0, usedHeap=*3929*, maxHeap=3973,
> blockCacheSize=6836104, blockCacheFree=826362424, blockCacheCount=0,
> blockCacheHitRatio=0, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>
> Among the other RS, the highest usedHeap is 1750
>
> On Sat, Jul 31, 2010 at 3:31 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
>> Hi,
>>
>> #3 is going to be tricky... due to the ebb And flow of the gc this value
>> isn't as accurate as one would wish. Furthermore we flush nematodes based
>> on
>> ram pressure.
>>
>> Any algorithm would have to have the property of being stable and
>> conservative... rebalancing is not a 0 impact operation.
>>
>> There are jiras open for the rebalance based on load. To date it hasn't
>> been
>> a practical problem here at SU in our prod clusters however.
>>
>> On Jul 31, 2010 3:18 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>> > Hi,
>> > Currently load balancing only considers region count.
>> > See ServerManager.getAverageLoad()
>> >
>> > I think load balancing should consider the following three factors for
>> each
>> > RS:
>> > 1. number of regions it hosts
>> > 2. number of requests it serves within given period
>> > 3. how close usedHeap is to maxHeap
>> >
>> > Please comment how we should weigh the above three factors in deciding
>> the
>> > regions to offload from each RS.
>> >
>> > Thanks
>>
>

Mime
View raw message