kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject [1/3] kudu git commit: Test for bug in exactly-once during tablet bootstrap
Date Thu, 04 May 2017 20:12:19 GMT
Repository: kudu
Updated Branches:
  refs/heads/master a3c5983d8 -> 174a058e2


Test for bug in exactly-once during tablet bootstrap

Here's a regression test for the bug which is causing
raft_consensus-itest to occasionally think it has inserted 23 rows when
in fact it has only inserted 20.

The issue is in the rewriting of logs during bootstrap. If we do a write
which gets a duplicate key error, the first time the COMMIT message is
written, it includes the error.

When the server restarts, it writes the COMMIT message again with only
'flushed: true' in the commit message. This is enough for bootstrap to
know not to bother to replay it on subsequent restarts, but it has lost
the error messages themselves.

If the server restarts again, at this point it doesn't rebuild a proper
response, but instead puts an errorless response into the ResultTracker.

So, if an operation hits an error, and then the tablet server restarts
twice while the client is still retrying, the client will falsely think
that its operation has succeeded.

This includes a disabled regression test which shows the bug.

Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a
Reviewed-on: http://gerrit.cloudera.org:8080/5417
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/c36e6c40
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/c36e6c40
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/c36e6c40

Branch: refs/heads/master
Commit: c36e6c40568e64a75a75dc208238b0c8ea6bd870
Parents: a3c5983
Author: Todd Lipcon <todd@apache.org>
Authored: Thu Dec 8 19:36:32 2016 +0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Thu May 4 19:37:51 2017 +0000

----------------------------------------------------------------------
 src/kudu/tserver/tablet_server-test-base.h |  6 ++-
 src/kudu/tserver/tablet_server-test.cc     | 60 +++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/c36e6c40/src/kudu/tserver/tablet_server-test-base.h
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/tablet_server-test-base.h b/src/kudu/tserver/tablet_server-test-base.h
index 6ebf0e3..29495dc 100644
--- a/src/kudu/tserver/tablet_server-test-base.h
+++ b/src/kudu/tserver/tablet_server-test-base.h
@@ -387,8 +387,10 @@ class TabletServerTestBase : public KuduTest {
           rb_row.Reset(&block, i);
           VLOG(1) << "Verified row " << schema.DebugRow(rb_row);
           ASSERT_LT(count, expected.size()) << "Got more rows than expected!";
-          ASSERT_EQ(expected[count].first, *schema.ExtractColumnFromRow<INT32>(rb_row,
0));
-          ASSERT_EQ(expected[count].second, *schema.ExtractColumnFromRow<INT32>(rb_row,
1));
+          EXPECT_EQ(expected[count].first, *schema.ExtractColumnFromRow<INT32>(rb_row,
0))
+              << "Key mismatch at row: " << count;
+          EXPECT_EQ(expected[count].second, *schema.ExtractColumnFromRow<INT32>(rb_row,
1))
+              << "Value mismatch at row: " << count;
           count++;
         }
       }

http://git-wip-us.apache.org/repos/asf/kudu/blob/c36e6c40/src/kudu/tserver/tablet_server-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/tablet_server-test.cc b/src/kudu/tserver/tablet_server-test.cc
index 2b41729..f4ba760 100644
--- a/src/kudu/tserver/tablet_server-test.cc
+++ b/src/kudu/tserver/tablet_server-test.cc
@@ -951,6 +951,66 @@ TEST_F(TabletServerTest, TestKUDU_1341) {
   ANFF(VerifyRows(schema_, { KeyValue(1, 12345) }));
 }
 
+TEST_F(TabletServerTest, DISABLED_TestExactlyOnceForErrorsAcrossRestart) {
+  WriteRequestPB req;
+  WriteResponsePB resp;
+  RpcController rpc;
+
+  // Set up a request to insert two rows.
+  req.set_tablet_id(kTabletId);
+  AddTestRowToPB(RowOperationsPB::INSERT, schema_, 1234, 5678, "hello world via RPC",
+                 req.mutable_row_operations());
+  AddTestRowToPB(RowOperationsPB::INSERT, schema_, 12345, 5679, "hello world via RPC2",
+                 req.mutable_row_operations());
+  ASSERT_OK(SchemaToPB(schema_, req.mutable_schema()));
+
+  // Insert it, assuming no errors.
+  {
+    SCOPED_TRACE(req.DebugString());
+    ASSERT_OK(proxy_->Write(req, &resp, &rpc));
+    SCOPED_TRACE(resp.DebugString());
+    ASSERT_FALSE(resp.has_error());
+    ASSERT_EQ(0, resp.per_row_errors_size());
+  }
+
+  // Set up a RequestID to use in the later requests.
+  rpc::RequestIdPB req_id;
+  req_id.set_client_id("client-id");
+  req_id.set_seq_no(1);
+  req_id.set_first_incomplete_seq_no(1);
+  req_id.set_attempt_no(1);
+
+  // Insert the row again, with the request ID specified. We should expect an
+  // "ALREADY_PRESENT" error.
+  {
+    rpc.Reset();
+    rpc.SetRequestIdPB(unique_ptr<rpc::RequestIdPB>(new rpc::RequestIdPB(req_id)));
+    ASSERT_OK(proxy_->Write(req, &resp, &rpc));
+    SCOPED_TRACE(resp.DebugString());
+    ASSERT_FALSE(resp.has_error());
+    ASSERT_EQ(2, resp.per_row_errors_size());
+  }
+
+  // Restart the tablet server several times, and after each restart, send a new attempt
of the
+  // same request. We make the request itself invalid by clearing the schema and ops, but
+  // that shouldn't matter since it's just hitting the ResultTracker and returning the
+  // cached response. If the ResultTracker didn't have a cached response, then we'd get an
+  // error about an invalid request.
+  req.clear_schema();
+  req.clear_row_operations();
+  for (int i = 1; i <= 5; i++) {
+    SCOPED_TRACE(Substitute("restart attempt #$0", i));
+    NO_FATALS(ShutdownAndRebuildTablet());
+    rpc.Reset();
+    req_id.set_attempt_no(req_id.attempt_no() + 1);
+    rpc.SetRequestIdPB(unique_ptr<rpc::RequestIdPB>(new rpc::RequestIdPB(req_id)));
+    ASSERT_OK(proxy_->Write(req, &resp, &rpc));
+    SCOPED_TRACE(resp.DebugString());
+    ASSERT_FALSE(resp.has_error());
+    ASSERT_EQ(2, resp.per_row_errors_size());
+  }
+}
+
 // Regression test for KUDU-177. Ensures that after a major delta compaction,
 // rows that were in the old DRS's DMS are properly replayed.
 TEST_F(TabletServerTest, TestKUDU_177_RecoveryOfDMSEditsAfterMajorDeltaCompaction) {


Mime
View raw message