Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 500589585 for ; Mon, 26 Mar 2012 22:22:52 +0000 (UTC) Received: (qmail 47957 invoked by uid 500); 26 Mar 2012 22:22:52 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 47918 invoked by uid 500); 26 Mar 2012 22:22:52 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 47910 invoked by uid 99); 26 Mar 2012 22:22:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 22:22:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 22:22:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BD6A9347E15 for ; Mon, 26 Mar 2012 22:22:27 +0000 (UTC) Date: Mon, 26 Mar 2012 22:22:27 +0000 (UTC) From: "Hudson (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <2052783487.19958.1332800547777.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1644762299.6162.1332465742678.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238910#comment-13238910 ] Hudson commented on HBASE-5623: ------------------------------- Integrated in HBase-0.94 #57 (See [https://builds.apache.org/job/HBase-0.94/57/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305549) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java > Race condition when rolling the HLog and hlogFlush > -------------------------------------------------- > > Key: HBASE-5623 > URL: https://issues.apache.org/jira/browse/HBASE-5623 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 0.94.0 > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Priority: Critical > Fix For: 0.94.0 > > Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch > > > When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: > {code} > Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException > at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) > at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) > at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) > at $Proxy1.multi(Unknown Source) > at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) > at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) > at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) > {code} > and > {code} > java.lang.NullPointerException > at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) > at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) > at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) > at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) > at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) > at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) > at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) > {code} > It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls > {code} > logSyncerThread.hlogFlush(this.writer); > {code} > without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira