hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-13391) TestRegionObserverInterface frequently failing on branch-1
Date Sat, 04 Apr 2015 19:41:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395898#comment-14395898
] 

Andrew Purtell edited comment on HBASE-13391 at 4/4/15 7:40 PM:
----------------------------------------------------------------

I was finally able to reproduce a failure once by introducing some external IO and CPU activity.
Attached are logs of TestRegionObserverInterface#testLegacyRecovery for a passing case and
a failing case. 

One thing I see is when on line 683 of TestRegionObserverInterface we say "All regions assigned",
in the failing case there is still WAL replay activity ongoing. They haven't finished yet
when we check for WAL related CP method invocations? I thought I'd see if disabling distributed
replay would change the behavior of the test:

{code}
diff --git a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/Test
RegionObserverInterface.java b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
index 5bd8b19..ba028dc 100644
--- a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
+++ b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
@@ -39,6 +39,7 @@ import org.apache.hadoop.hbase.CellUtil;
 import org.apache.hadoop.hbase.Coprocessor;
 import org.apache.hadoop.hbase.HBaseTestingUtility;
 import org.apache.hadoop.hbase.HColumnDescriptor;
+import org.apache.hadoop.hbase.HConstants;
 import org.apache.hadoop.hbase.HRegionInfo;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.KeyValue;
@@ -101,6 +102,7 @@ public class TestRegionObserverInterface {
     conf.setStrings(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY,
         "org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver",
         "org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy");
+    conf.setBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, false);
 
     util.startMiniCluster();
     cluster = util.getMiniHBaseCluster();
{code}

but that causes a different sort of failure:

{noformat}
java.lang.AssertionError: Result of org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore
is expected to be 1, while we get 3
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:753)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:687)
{noformat}

Any thoughts on what might be going on here [~busbey]? 


was (Author: apurtell):
I was finally able to reproduce a failure once by introducing some external IO and CPU activity.
Attached are logs of TestRegionObserverInterface#testLegacyRecovery for a passing case and
a failing case. 

One thing I see is when on line 683 of TestRegionObserverInterface we say "All regions assigned",
in the failing case there is still SplitWorker activity ongoing. Replay ops haven't finished
yet when we check for WAL related CP method invocations? I thought I'd see if disabling distributed
replay would change the behavior of the test:

{code}
diff --git a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/Test
RegionObserverInterface.java b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
index 5bd8b19..ba028dc 100644
--- a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
+++ b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
@@ -39,6 +39,7 @@ import org.apache.hadoop.hbase.CellUtil;
 import org.apache.hadoop.hbase.Coprocessor;
 import org.apache.hadoop.hbase.HBaseTestingUtility;
 import org.apache.hadoop.hbase.HColumnDescriptor;
+import org.apache.hadoop.hbase.HConstants;
 import org.apache.hadoop.hbase.HRegionInfo;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.KeyValue;
@@ -101,6 +102,7 @@ public class TestRegionObserverInterface {
     conf.setStrings(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY,
         "org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver",
         "org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy");
+    conf.setBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, false);
 
     util.startMiniCluster();
     cluster = util.getMiniHBaseCluster();
{code}

but that causes a different sort of failure:

{noformat}
java.lang.AssertionError: Result of org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore
is expected to be 1, while we get 3
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:753)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:687)
{noformat}

Any thoughts on what might be going on here [~busbey]? 

> TestRegionObserverInterface frequently failing on branch-1 
> -----------------------------------------------------------
>
>                 Key: HBASE-13391
>                 URL: https://issues.apache.org/jira/browse/HBASE-13391
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: test.log.fail.txt, test.log.pass.txt
>
>
> TestRegionObserverInterface is frequently failing on branch-1 .
> Example:
> {noformat}
> java.lang.AssertionError: Result of org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore
is expected to be 1, while we get 0
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:751)
> 	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:685)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message