kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject [kudu] branch master updated: [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
Date Fri, 25 Jan 2019 00:56:58 GMT
This is an automated email from the ASF dual-hosted git repository.

adar pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/master by this push:
     new a7e3df1  [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
a7e3df1 is described below

commit a7e3df11b239a9b54fe8accc0156129b43b33ac7
Author: Adar Dembo <adar@cloudera.com>
AuthorDate: Wed Jan 23 15:40:36 2019 -0800

    [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
    
    From time to time I'd see test failures like these:
    
      10:10:16.018 [INFO - Test worker] (KuduTestHarness.java:147) Creating a new Kudu client...
      ...
      10:10:16.036 [WARN - New I/O worker #158] (ConnectToCluster.java:278) None of the provided
masters 127.6.239.254:42291,127.6.239.252:41769,127.6.239.253:41053 is a leader; will retry
      ...
      10:10:16.060 [ERROR - Test worker] (RetryRule.java:80) testExportAuthenticationCredentialsDuringLeaderElection(org.apache.kudu.client.TestKuduClient):
failed attempt 1
      org.apache.kudu.client.NoLeaderFoundException: Master config (127.6.239.254:42291,127.6.239.252:41769,127.6.239.253:41053)
has no leader.
        at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
        at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
        at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
        at org.apache.kudu.client.KuduRpc.callback(KuduRpc.java:294)
        at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:269)
        at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59)
        at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:131)
        at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:127)
        at org.apache.kudu.client.Connection.messageReceived(Connection.java:391)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243)
        <more netty stack frames>
    
    I'd look through the code and wonder how this could happen if KUDU-2387 was
    indeed fixed. Today I finally noticed that
    KuduTestHarness.findLeaderMasterServer calls getMasterTableLocationsPB
    directly, and I remembered that without applying the same logic as in
    KUDU-2387, such calls will not retry. After adding a catch block to
    findLeaderMasterServer and transforming the thrown exception, I got a useful
    stack trace confirming the problem:
    
      10:51:53.627 [ERROR - Test worker] (RetryRule.java:80) testExportAuthenticationCredentialsDuringLeaderElection(org.apache.kudu.client.TestKuduClient):
failed attempt 1
      org.apache.kudu.client.NoLeaderFoundException: Master config (127.11.27.62:40985,127.11.27.60:37593,127.11.27.61:37931)
has no leader.
        at org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
        at org.apache.kudu.test.KuduTestHarness.findLeaderMasterServer(KuduTestHarness.java:281)
        at org.apache.kudu.test.KuduTestHarness.restartLeaderMaster(KuduTestHarness.java:329)
        at org.apache.kudu.client.TestKuduClient.runTestCallDuringLeaderElection(TestKuduClient.java:1124)
        at
      org.apache.kudu.client.TestKuduClient.testExportAuthenticationCredentialsDuringLeaderElection(TestKuduClient.java:1150)
        ...
        Suppressed: org.apache.kudu.client.KuduException$OriginalException: Original asynchronous
stack trace
            at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
            at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
            at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
            at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
            at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
            at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
            at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
            at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
            <...netty>
    
    This patch fixes these issues by providing an alternate way to find the
    leader master: if not known, make some call that will only succeed if the
    leader master is known, then try again.
    
    Without the fix, 29/1000 runs of TestKuduClient failed with this error,
    either in testExportAuthenticationCredentialsDuringLeaderElection or in
    testGetHiveMetastoreConfigDuringLeaderElection.
    
    With the fix, 0/1000 runs of TestKuduClient failed.
    
    Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
    Reviewed-on: http://gerrit.cloudera.org:8080/12263
    Tested-by: Kudu Jenkins
    Reviewed-by: Grant Henke <granthenke@apache.org>
    Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
---
 .../org/apache/kudu/client/AsyncKuduClient.java    |  3 +--
 .../java/org/apache/kudu/client/KuduClient.java    | 28 ++++++++++++++++++++++
 .../java/org/apache/kudu/client/ServerInfo.java    |  8 +++++++
 .../java/org/apache/kudu/test/KuduTestHarness.java | 20 +---------------
 4 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java b/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
index 9b5b280..72b96a9 100644
--- a/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
+++ b/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
@@ -1711,8 +1711,7 @@ public class AsyncKuduClient implements AutoCloseable {
    * fill a {@link Master.GetTabletLocationsResponsePB} object.
    * @return An initialized Deferred object to hold the response.
    */
-  @InterfaceAudience.LimitedPrivate("Test")
-  public Deferred<Master.GetTableLocationsResponsePB> getMasterTableLocationsPB(KuduRpc<?>
parentRpc) {
+  Deferred<Master.GetTableLocationsResponsePB> getMasterTableLocationsPB(KuduRpc<?>
parentRpc) {
     // TODO(todd): stop using this 'masterTable' hack.
     return ConnectToCluster.run(masterTable, masterAddresses, parentRpc,
         defaultAdminOperationTimeoutMs, Connection.CredentialsPolicy.ANY_CREDENTIALS).addCallback(
diff --git a/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java b/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
index b21bbfb..1e14a8f 100644
--- a/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
+++ b/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
@@ -21,6 +21,7 @@ import java.util.ArrayList;
 import java.util.List;
 import java.util.concurrent.Executor;
 
+import com.google.common.base.Preconditions;
 import com.stumbleupon.async.Callback;
 import com.stumbleupon.async.Deferred;
 import org.apache.yetus.audience.InterfaceAudience;
@@ -366,6 +367,33 @@ public class KuduClient implements AutoCloseable {
     return asyncClient.getMasterAddressesAsString();
   }
 
+  /**
+   * @return a HostAndPort describing the current leader master
+   * @throws KuduException if a leader master could not be found in time
+   */
+  @InterfaceAudience.LimitedPrivate("Test")
+  public HostAndPort findLeaderMasterServer() throws KuduException {
+    // Consult the cache to determine the current leader master.
+    //
+    // If one isn't found, issue an RPC that retries until the leader master
+    // is discovered. We don't need the RPC's results; it's just a simple way to
+    // wait until a leader master is elected.
+    TableLocationsCache.Entry entry = asyncClient.getTableLocationEntry(
+        AsyncKuduClient.MASTER_TABLE_NAME_PLACEHOLDER, null);
+    if (entry == null) {
+      // If there's no leader master, this will time out and throw an exception.
+      listTabletServers();
+
+      entry = asyncClient.getTableLocationEntry(
+          AsyncKuduClient.MASTER_TABLE_NAME_PLACEHOLDER, null);
+    }
+    Preconditions.checkNotNull(entry);
+    Preconditions.checkState(!entry.isNonCoveredRange());
+    ServerInfo info = entry.getTablet().getLeaderServerInfo();
+    Preconditions.checkNotNull(info);
+    return info.getHostAndPort();
+  }
+
   // Helper method to handle joining and transforming the Exception we receive.
   static <R> R joinAndHandleException(Deferred<R> deferred) throws KuduException
{
     try {
diff --git a/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java b/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java
index cad4b21..1989794 100644
--- a/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java
+++ b/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java
@@ -86,6 +86,14 @@ public class ServerInfo {
   }
 
   /**
+   * Returns this server's hostname and port.
+   * @return a HostAndPort that describes where this server can be reached.
+   */
+  public HostAndPort getHostAndPort() {
+    return hostPort;
+  }
+
+  /**
    * Returns this server's port.
    * @return a port number that this server is bound to
    */
diff --git a/java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
b/java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
index ed45bb5..44f6a22 100644
--- a/java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
+++ b/java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
@@ -16,9 +16,6 @@
 // under the License.
 package org.apache.kudu.test;
 
-import com.google.common.base.Stopwatch;
-import com.stumbleupon.async.Deferred;
-import org.apache.kudu.Common;
 import org.apache.kudu.client.AsyncKuduClient;
 import org.apache.kudu.client.AsyncKuduClient.AsyncKuduClientBuilder;
 import org.apache.kudu.client.DeadlineTracker;
@@ -28,7 +25,6 @@ import org.apache.kudu.client.KuduException;
 import org.apache.kudu.client.KuduTable;
 import org.apache.kudu.client.LocatedTablet;
 import org.apache.kudu.client.RemoteTablet;
-import org.apache.kudu.master.Master;
 import org.apache.kudu.test.cluster.MiniKuduCluster;
 import org.apache.kudu.test.cluster.MiniKuduCluster.MiniKuduClusterBuilder;
 import org.apache.kudu.test.cluster.FakeDNS;
@@ -49,7 +45,6 @@ import java.lang.annotation.Target;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Random;
-import java.util.concurrent.TimeUnit;
 
 import static org.junit.Assert.fail;
 
@@ -270,20 +265,7 @@ public class KuduTestHarness extends ExternalResource {
    * @throws Exception if we are unable to find the leader master
    */
   public HostAndPort findLeaderMasterServer() throws Exception {
-    Stopwatch sw = Stopwatch.createStarted();
-    while (sw.elapsed(TimeUnit.MILLISECONDS) < DEFAULT_SLEEP) {
-      Deferred<Master.GetTableLocationsResponsePB> masterLocD =
-          asyncClient.getMasterTableLocationsPB(null);
-      Master.GetTableLocationsResponsePB r = masterLocD.join(DEFAULT_SLEEP);
-      Common.HostPortPB pb = r.getTabletLocations(0)
-          .getReplicas(0)
-          .getTsInfo()
-          .getRpcAddresses(0);
-      if (pb.getPort() != -1) {
-        return new HostAndPort(pb.getHost(), pb.getPort());
-      }
-    }
-    throw new IOException(String.format("No leader master found after %d ms", DEFAULT_SLEEP));
+    return client.findLeaderMasterServer();
   }
 
   /**


Mime
View raw message