Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 45146 invoked from network); 4 Dec 2008 04:49:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Dec 2008 04:49:41 -0000 Received: (qmail 41243 invoked by uid 500); 4 Dec 2008 04:49:46 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 41189 invoked by uid 500); 4 Dec 2008 04:49:46 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 41166 invoked by uid 99); 4 Dec 2008 04:49:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Dec 2008 20:49:46 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2008 04:48:26 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3E60A234C2A6 for ; Wed, 3 Dec 2008 20:48:44 -0800 (PST) Message-ID: <348655118.1228366124254.JavaMail.jira@brutus> Date: Wed, 3 Dec 2008 20:48:44 -0800 (PST) From: "Andrew Purtell (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Resolved: (HBASE-1040) OOME does not cause graceful shutdown under some failure scenarios In-Reply-To: <1798467071.1228183307437.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-1040. ----------------------------------- Resolution: Fixed Fixed by HBASE-1042. > OOME does not cause graceful shutdown under some failure scenarios > ------------------------------------------------------------------ > > Key: HBASE-1040 > URL: https://issues.apache.org/jira/browse/HBASE-1040 > Project: Hadoop HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.18.1 > Reporter: Andrew Purtell > Fix For: 0.19.0 > > > I am seeing these exceptions on our cluster in output from tablemap/tablereduce jobs: > > java.io.IOException: java.lang.OutOfMemoryError: Java heap space > > at java.io.DataInputStream.readFull(DataInputSteram.java:175) > > at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64) > > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102) > > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933) > > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833) > > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879) > > at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516) > > at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312) > When such OOMEs as above happen, the cluster does not recover without manual intervention. The regionservers sometimes go down after this, or sometimes do not and stay up in sick condition for a while. Regions go offline and remain unavailable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.