Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8737B9FEF for ; Tue, 25 Oct 2011 23:22:56 +0000 (UTC) Received: (qmail 16369 invoked by uid 500); 25 Oct 2011 23:22:55 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 16333 invoked by uid 500); 25 Oct 2011 23:22:55 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 16287 invoked by uid 99); 25 Oct 2011 23:22:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2011 23:22:55 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2011 23:22:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6A3B631BED5 for ; Tue, 25 Oct 2011 23:20:32 +0000 (UTC) Date: Tue, 25 Oct 2011 23:20:32 +0000 (UTC) From: "Jean-Daniel Cryans (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <1073614840.15777.1319584832436.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1265762513.2972.1297190817469.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135545#comment-13135545 ] Jean-Daniel Cryans commented on HBASE-3515: ------------------------------------------- To reiterate the problem, it's possible to not be able to add an HLog to replicate if the session is expired when log rolling. HLog currently doesn't get any feedback from the WALActionListeners, even if they fail at doing their job. One way of fixing it would be to throw an exception and stop the log rolling, but it means that if there's many listeners that some may already have processed the adding of the log. We could also kill the region server plain and simple if it happens. I'm in favor of the latter. > [replication] ReplicationSource can miss a log after RS comes out of GC > ----------------------------------------------------------------------- > > Key: HBASE-3515 > URL: https://issues.apache.org/jira/browse/HBASE-3515 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.0 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Critical > Fix For: 0.92.0 > > Attachments: HBASE-3515.patch > > > This is from Hudson build 1738, if a log is about to be rolled and the ZK connection is already closed then the replication code will fail at adding the new log in ZK but the log will still be rolled and it's possible that some edits will make it in. > From the log: > {quote} > 2011-02-08 10:21:20,618 FATAL [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] regionserver.HRegionServer(1383): > ABORTING region server serverName=vesta.apache.org,46117,1297160399378, load=(requests=1525, regions=12, > usedHeap=273, maxHeap=1244): Failed add log to list > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for > /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509 > ... > 2011-02-08 10:21:22,444 DEBUG [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] wal.HLogSplitter(258): > Splitting hlog 8 of 8: hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509, length=0 > 2011-02-08 10:21:22,862 DEBUG [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] wal.HLogSplitter(436): > Pushed=31 entries from hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509 > {quote} > The easiest thing to do would be let the exception out and cancel the log roll. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira