Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00C318EC1 for ; Fri, 26 Aug 2011 17:41:54 +0000 (UTC) Received: (qmail 18961 invoked by uid 500); 26 Aug 2011 17:41:53 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 18917 invoked by uid 500); 26 Aug 2011 17:41:53 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 18909 invoked by uid 99); 26 Aug 2011 17:41:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Aug 2011 17:41:53 +0000 X-ASF-Spam-Status: No, hits=-2000.9 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Aug 2011 17:41:50 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3E8D8D1AC9 for ; Fri, 26 Aug 2011 17:41:29 +0000 (UTC) Date: Fri, 26 Aug 2011 17:41:29 +0000 (UTC) From: "Dave Latham (JIRA)" To: issues@hbase.apache.org Message-ID: <841626398.18470.1314380489252.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1089170170.18155.1310753280446.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4107) OOME while writing WAL checksum causes corrupt WAL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091896#comment-13091896 ] Dave Latham commented on HBASE-4107: ------------------------------------ It looks like HLog main has support to invoke --split. Does it looks like if I call that on the log that it will put split it and put the data into the right place? We had a handful of regionservers go OOM yesterday while a MR job was doing heavy writes to a column family that doesn't usually get them. In this case, the first OOM occurred here during writing the checksum. > OOME while writing WAL checksum causes corrupt WAL > -------------------------------------------------- > > Key: HBASE-4107 > URL: https://issues.apache.org/jira/browse/HBASE-4107 > Project: HBase > Issue Type: Bug > Components: regionserver, wal > Affects Versions: 0.90.1 > Environment: CentOS 5.5x64 > Reporter: Andy Sautins > Attachments: master.splitting.log, regionserver.oom.log > > > An issue was observed where upon shutdown of a regionserver the regionserver log was corrupt. It appears from the following stacktrace that an Java heap memory exception occurred while writing the checksum to the WAL. Corrupting the WAL can potentially cause data loss. > 2011-07-14 14:54:53,741 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog > java.io.IOException: Reflection > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987) > at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:964) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor1336.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 2 more > Caused by: java.lang.OutOfMemoryError: Java heap space > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.(DFSClient.java:2375) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3271) > at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150) > at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3354) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) > ... 6 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira