hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Umesh Agashe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18366) Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
Date Wed, 12 Jul 2017 04:32:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083428#comment-16083428

Umesh Agashe commented on HBASE-18366:

Thanks [~stack], [~yangzhe1991]!

I think its timing issue as I have seen it passing too! But for me its failing much more number
of times than passing. I am still debugging it. From what I see:
TableNotFoundException is for table 'testRecoveryAndDoubleExecution-carryingMeta-true'. This
table is created by the test and exception is thrown in util.countRows() when table is scanned,
in following code snippet:

      // Now run through the procedure twice crashing the executor on each step...
      MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(procExec, procId);
      // Assert all data came back.
      assertEquals(count, util.countRows(t));

Here is the exception:
org.apache.hadoop.hbase.TableNotFoundException: testRecoveryAndDoubleExecution-carryingMeta-true
  at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:845)
  at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:745)
  at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:720)
  at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:316)
  at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
  at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
  at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:104)
  at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

At this time I am not quite sure about how changes for HBASE-17931 are affecting the test
but after reverting the changes locally I ran test 4-5 times and it passed all the time. If
meta region is being transitioned while scan is going on, we can see this exception but I
will have to confirm thats the case here.

AssignmentManager.checkIfShouldMoveSystemRegionAsync() is being called during active master
initialization and from RegionServerTracker.refresh() and moveAsync() is used to submit the
procedure. This can explain timing issue. If I can not get to bottom of this by tomorrow,
I will disable the test and continue working on it.

> Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> ---------------------------------------------------------------------------------------------------------
>                 Key: HBASE-18366
>                 URL: https://issues.apache.org/jira/browse/HBASE-18366
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Umesh Agashe
>            Assignee: Umesh Agashe
> It worked for a few days after enabling it with HBASE-18278. But started failing after
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those commits.
> Currently it fails with TableNotFoundException.

This message was sent by Atlassian JIRA

View raw message