Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 78491 invoked from network); 28 Nov 2010 11:49:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Nov 2010 11:49:03 -0000 Received: (qmail 20754 invoked by uid 500); 28 Nov 2010 11:49:03 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 20669 invoked by uid 500); 28 Nov 2010 11:49:03 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 20661 invoked by uid 99); 28 Nov 2010 11:49:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Nov 2010 11:49:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Nov 2010 11:48:59 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oASBmbgB025132 for ; Sun, 28 Nov 2010 11:48:38 GMT Message-ID: <11599018.5891290944917973.JavaMail.jira@thor> Date: Sun, 28 Nov 2010 06:48:37 -0500 (EST) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-2506) Too easy to OOME a RS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964539#action_12964539 ] Ted Yu commented on HBASE-2506: ------------------------------- Andrew mentioned deferring splits as one of the reaction to low memory condition. One scenario for performing split is that there're few regions heavily written to and they all reside on the same region server. In this case we can move them to less loaded servers and split them to distribute load. If the load balancer is aware of server load in terms of number of accesses, this scenario could at least be delayed. > Too easy to OOME a RS > --------------------- > > Key: HBASE-2506 > URL: https://issues.apache.org/jira/browse/HBASE-2506 > Project: HBase > Issue Type: Bug > Reporter: Jean-Daniel Cryans > Priority: Blocker > Fix For: 0.92.0 > > > Testing a cluster with 1GB heap, I found that we are letting the region servers kill themselves too easily when scanning using pre-fetching. To reproduce, get 10-20M rows using PE and run a count in the shell using CACHE => 30000 or any other very high number. For good measure, here's the stack trace: > {code} > 2010-04-30 13:20:23,241 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, aborting. > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2786) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.hbase.client.Result.writeArray(Result.java:478) > at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:312) > at org.apache.hadoop.hbase.io.HbaseObjectWritable.write(HbaseObjectWritable.java:229) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:941) > 2010-04-30 13:20:23,241 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=29, stores=29, storefiles=44, storefileIndexSize=6, memstoreSize=255, > compactionQueueSize=0, usedHeap=926, maxHeap=987, blockCacheSize=1700064, blockCacheFree=205393696, blockCacheCount=0, blockCacheHitRatio=0 > {code} > I guess the same could happen with largish write buffers. We need something better than OOME. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.