Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86FC018B30 for ; Sat, 5 Sep 2015 05:01:46 +0000 (UTC) Received: (qmail 28319 invoked by uid 500); 5 Sep 2015 05:01:46 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 28264 invoked by uid 500); 5 Sep 2015 05:01:46 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 28245 invoked by uid 99); 5 Sep 2015 05:01:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Sep 2015 05:01:46 +0000 Date: Sat, 5 Sep 2015 05:01:46 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14368) New TestWALLockup broken by addendum added to parent issue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731803#comment-14731803 ] stack commented on HBASE-14368: ------------------------------- kalashnikov:hbase.git stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15423/consoleText Fetching the console output from the URL Printing hanging tests Printing Failing tests Let me retry > New TestWALLockup broken by addendum added to parent issue > ---------------------------------------------------------- > > Key: HBASE-14368 > URL: https://issues.apache.org/jira/browse/HBASE-14368 > Project: HBase > Issue Type: Sub-task > Components: test > Reporter: stack > Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3 > > Attachments: 14368.txt, 14368.txt > > > My second addendum broke TestWALLockup, the one that did this: https://issues.apache.org/jira/browse/HBASE-14317?focusedCommentId=14730301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14730301 > {code} > diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > index 5708c30..c421f5c 100644 > --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > @@ -878,8 +878,19 @@ public class FSHLog implements WAL { > // Let the writer thread go regardless, whether error or not. > if (zigzagLatch != null) { > zigzagLatch.releaseSafePoint(); > - // It will be null if we failed our wait on safe point above. > - if (syncFuture != null) blockOnSync(syncFuture); > + // syncFuture will be null if we failed our wait on safe point above. Otherwise, if > + // latch was obtained successfully, the sync we threw in either trigger the latch or it > + // got stamped with an exception because the WAL was damaged and we could not sync. Now > + // the write pipeline has been opened up again by releasing the safe point, process the > + // syncFuture we got above. This is probably a noop but it may be stale exception from > + // when old WAL was in place. Catch it if so. > + if (syncFuture != null) { > + try { > + blockOnSync(syncFuture); > + } catch (IOException ioe) { > + if (LOG.isTraceEnabled()) LOG.trace("Stale sync exception", ioe); > + } > + } > {code} > It broke the test because the test hand feeds appends and syncs with when they should throw exceptions. In the test we manufactured the case where an append fails and we then asserted the following sync would fail. > Problem was that we expected the failure to be a dropped snapshot failure because fail of sync is a catastrophic event... but our hand feeding actually reproduced the case where a sync goes into the damaged file... before it had rolled... which is no longer a catastrophic event... we just catch and move on. > The attached patch just removes check for dropped snapshot and that abort was called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)