hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2506) Too easy to OOME a RS
Date Thu, 03 Mar 2011 22:57:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002297#comment-13002297
] 

ryan rawson commented on HBASE-2506:
------------------------------------

we could catch the oom in this case and instead return an error to the
client.  if you are unable to allocate a 500MB buffer to send a rpc
response it might not actually need to kill the RS, because if we are
truly out of memory different threads will catch that.  So catch that
OOM then send an exception response instead.

Does that sound good?


> Too easy to OOME a RS
> ---------------------
>
>                 Key: HBASE-2506
>                 URL: https://issues.apache.org/jira/browse/HBASE-2506
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>              Labels: moved_from_0_20_5
>             Fix For: 0.92.0
>
>
> Testing a cluster with 1GB heap, I found that we are letting the region servers kill
themselves too easily when scanning using pre-fetching. To reproduce, get 10-20M rows using
PE and run a count in the shell using CACHE => 30000 or any other very high number. For
good measure, here's the stack trace:
> {code}
> 2010-04-30 13:20:23,241 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2786)
>         at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at org.apache.hadoop.hbase.client.Result.writeArray(Result.java:478)
>         at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:312)
>         at org.apache.hadoop.hbase.io.HbaseObjectWritable.write(HbaseObjectWritable.java:229)
>         at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:941)
> 2010-04-30 13:20:23,241 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump
of metrics: request=0.0, regions=29, stores=29, storefiles=44, storefileIndexSize=6, memstoreSize=255,
>  compactionQueueSize=0, usedHeap=926, maxHeap=987, blockCacheSize=1700064, blockCacheFree=205393696,
blockCacheCount=0, blockCacheHitRatio=0
> {code}
> I guess the same could happen with largish write buffers. We need something better than
OOME.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message