kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ale...@apache.org
Subject [2/2] kudu git commit: KUDU-2580 [c++ client] authn token reacquisition fix
Date Wed, 19 Sep 2018 20:46:48 GMT
KUDU-2580 [c++ client] authn token reacquisition fix

This patch updates the authn token reacquisition logic to handle
the following scenario:

1. Client is running against a multi-master cluster.
2. Client successfully authenticates and gets an authn token by calling
   ConnectToCluster().
3. Client keeps the connection to leader master open, but follower
   masters close connections to the client due to inactivity.
4. After the authn token expires, a change in the master leadership
   happens.
5. Client tries to open a table, making a request to the former leader
   master. The former leader returns NOT_THE_LEADER error.

The original authn token reacquisition logic was straightforwardly
retrying RPCs in case of error responses with code
FATAL_INVALID_AUTHENTICATION_TOKEN only.  The scenario described
above was not handled properly, so the opening table operation would
fail with the following error:
   Timed out: GetTableSchema timed out after deadline expired

Under the hood, the following would happen prior to this patch in
the above scenario when calling the GetTableSchema():
  * Having the connection to the former leader master still open,
    but authn token expired and master re-election selected another
    master as leader, the GetTableSchema() RPC call would hit the
    code of the last 'if ()' closure within the 'for ()' retry cycle
    in the KuduClient::Data::SyncLeaderMasterRpc() method,
    with error code MasterErrorPB::NOT_THE_LEADER.
  * The ConnectToMaster() attempt would return an error with the
    FATAL_INVALID_AUTHENTICATION_TOKEN error code, and the control
    flow will continue with the retries, but the state of the client
    stays the same (i.e. the leader master proxy is not updated).
  * That will happen again and again, until the GetTableSchema() RPC
    call times out.

This patch also enables the ClientReacquiresAuthnToken scenario
of MultiMasterIdleConnectionsITest since it passes with this fix.

Change-Id: I4477d0f2bb36ee5ef580b585cae189a634d5002f
Reviewed-on: http://gerrit.cloudera.org:8080/11449
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ffb1f22a
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ffb1f22a
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ffb1f22a

Branch: refs/heads/master
Commit: ffb1f22a70c8ef67fd4f8ab8d8a7203020674c4e
Parents: cc5c760
Author: Alexey Serbin <aserbin@cloudera.com>
Authored: Sun Sep 16 21:58:28 2018 -0700
Committer: Alexey Serbin <aserbin@cloudera.com>
Committed: Wed Sep 19 20:46:14 2018 +0000

----------------------------------------------------------------------
 src/kudu/client/client-internal.cc              | 56 ++++++++++++++------
 src/kudu/client/client-internal.h               | 12 +++++
 .../authn_token_expire-itest.cc                 |  2 +-
 3 files changed, 53 insertions(+), 17 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/ffb1f22a/src/kudu/client/client-internal.cc
----------------------------------------------------------------------
diff --git a/src/kudu/client/client-internal.cc b/src/kudu/client/client-internal.cc
index 0fb45ad..8300ecc 100644
--- a/src/kudu/client/client-internal.cc
+++ b/src/kudu/client/client-internal.cc
@@ -222,8 +222,7 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
           << "): " << s.ToString();
       if (client->IsMultiMaster()) {
         LOG(INFO) << "Determining the new leader Master and retrying...";
-        WARN_NOT_OK(ConnectToCluster(client, deadline),
-                    "Unable to determine the new leader Master");
+        ReconnectToCluster(client, deadline, ReconnectionReason::OTHER);
         continue;
       }
     }
@@ -233,14 +232,8 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
       if (err && err->has_code() &&
           err->code() == ErrorStatusPB::FATAL_INVALID_AUTHENTICATION_TOKEN) {
         // Assuming the token has expired: it's necessary to get a new one.
-        LOG(INFO) << "Reconnecting to the cluster for a new authn token";
-        const Status connect_status = ConnectToCluster(
-            client, deadline, CredentialsPolicy::PRIMARY_CREDENTIALS);
-        if (PREDICT_FALSE(!connect_status.ok())) {
-          KLOG_EVERY_N_SECS(WARNING, 1)
-              << "Unable to reconnect to the cluster for a new authn token: "
-              << connect_status.ToString();
-        }
+        ReconnectToCluster(client, deadline,
+                           ReconnectionReason::INVALID_AUTHN_TOKEN);
         continue;
       }
     }
@@ -252,8 +245,7 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
           << "): " << s.ToString();
       if (client->IsMultiMaster()) {
         LOG(INFO) << "Determining the new leader Master and retrying...";
-        WARN_NOT_OK(ConnectToCluster(client, deadline),
-                    "Unable to determine the new leader Master");
+        ReconnectToCluster(client, deadline, ReconnectionReason::OTHER);
       }
       continue;
     }
@@ -266,8 +258,7 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
             << "): " << s.ToString();
         if (client->IsMultiMaster()) {
           LOG(INFO) << "Determining the new leader Master and retrying...";
-          WARN_NOT_OK(ConnectToCluster(client, deadline),
-                      "Unable to determine the new leader Master");
+          ReconnectToCluster(client, deadline, ReconnectionReason::OTHER);
         }
         continue;
       } else {
@@ -282,8 +273,7 @@ Status KuduClient::Data::SyncLeaderMasterRpc(
           resp->error().code() == MasterErrorPB::CATALOG_MANAGER_NOT_INITIALIZED) {
         if (client->IsMultiMaster()) {
           KLOG_EVERY_N_SECS(INFO, 1) << "Determining the new leader Master and retrying...";
-          WARN_NOT_OK(ConnectToCluster(client, deadline),
-                      "Unable to determine the new leader Master");
+          ReconnectToCluster(client, deadline, ReconnectionReason::OTHER);
         }
         continue;
       } else {
@@ -831,6 +821,40 @@ void KuduClient::Data::ConnectToClusterAsync(KuduClient* client,
   }
 }
 
+void KuduClient::Data::ReconnectToCluster(KuduClient* client,
+                                          const MonoTime& deadline,
+                                          ReconnectionReason reason) {
+  DCHECK(client);
+  DCHECK(reason == ReconnectionReason::OTHER ||
+         reason == ReconnectionReason::INVALID_AUTHN_TOKEN);
+  if (reason == ReconnectionReason::OTHER) {
+    const auto s = ConnectToCluster(client, deadline,
+                                    CredentialsPolicy::ANY_CREDENTIALS);
+    if (s.ok()) {
+      return;
+    }
+    if (!s.IsNotAuthorized()) {
+      // In case of NotAutorized() error, that's most likely due to invalid
+      // authentication token. That's the only case when it's worth trying
+      // to re-connect to the cluster using primary credentials.
+      //
+      // TODO(aserbin): refactor ConnectToCluster to purge cached master proxy
+      //                in case of NOT_THE_LEADER error and update it to
+      //                handle FATAL_INVALID_AUTHENTICATION_TOKEN error as well.
+      WARN_NOT_OK(s, "Unable to determine the new leader Master");
+      return;
+    }
+  }
+  LOG(INFO) << "Reconnecting to the cluster for a new authn token";
+  const auto connect_status = ConnectToCluster(
+      client, deadline, CredentialsPolicy::PRIMARY_CREDENTIALS);
+  if (PREDICT_FALSE(!connect_status.ok())) {
+    KLOG_EVERY_N_SECS(WARNING, 1)
+        << "Unable to reconnect to the cluster for a new authn token: "
+        << connect_status.ToString();
+  }
+}
+
 HostPort KuduClient::Data::leader_master_hostport() const {
   std::lock_guard<simple_spinlock> l(leader_master_lock_);
   return leader_master_hostport_;

http://git-wip-us.apache.org/repos/asf/kudu/blob/ffb1f22a/src/kudu/client/client-internal.h
----------------------------------------------------------------------
diff --git a/src/kudu/client/client-internal.h b/src/kudu/client/client-internal.h
index 5958533..e359114 100644
--- a/src/kudu/client/client-internal.h
+++ b/src/kudu/client/client-internal.h
@@ -188,6 +188,18 @@ class KuduClient::Data {
       const MonoTime& deadline,
       rpc::CredentialsPolicy creds_policy = rpc::CredentialsPolicy::ANY_CREDENTIALS);
 
+  // A wrapper around ConnectToCluster() to handle various errors in case
+  // if a call to thought-to-be-leader master fails. First, this method calls
+  // ConnectToCluster() with current client credentials unless
+  // INVALID_AUTHN_TOKEN reason is specified. If the ConnectToCluster() with the
+  // current client credentials fails, call ConnectToCluster() with primary
+  // credentials. The ReconnectionReason is a dedicated enumeration for the
+  // third parameter of the method.
+  enum class ReconnectionReason { INVALID_AUTHN_TOKEN, OTHER };
+  void ReconnectToCluster(KuduClient* client,
+                          const MonoTime& deadline,
+                          ReconnectionReason reason);
+
   std::shared_ptr<master::MasterServiceProxy> master_proxy() const;
 
   HostPort leader_master_hostport() const;

http://git-wip-us.apache.org/repos/asf/kudu/blob/ffb1f22a/src/kudu/integration-tests/authn_token_expire-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/authn_token_expire-itest.cc b/src/kudu/integration-tests/authn_token_expire-itest.cc
index 835c74e..c6dd663 100644
--- a/src/kudu/integration-tests/authn_token_expire-itest.cc
+++ b/src/kudu/integration-tests/authn_token_expire-itest.cc
@@ -484,7 +484,7 @@ class MultiMasterIdleConnectionsITest : public AuthnTokenExpireITestBase
{
 // authn token. Prior to the KUDU-2580 fix, it didn't, and the test was failing
 // when the client tried to open the test table after master leader re-election:
 //   Timed out: GetTableSchema timed out after deadline expired
-TEST_F(MultiMasterIdleConnectionsITest, DISABLED_ClientReacquiresAuthnToken) {
+TEST_F(MultiMasterIdleConnectionsITest, ClientReacquiresAuthnToken) {
   const string kTableName = "keep-connection-to-former-master-leader";
 
   if (!AllowSlowTests()) {


Mime
View raw message