kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject incubator-kudu git commit: KUDU-1538: prevent block ID reuse to avoid potential data loss
Date Sat, 23 Jul 2016 01:58:43 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master ddc2047f5 -> 88a1cf0db


KUDU-1538: prevent block ID reuse to avoid potential data loss

This changes the block managers to allocate block IDs sequentially
rather than randomly. Given our 64-bit block IDs, this prevents ever
reusing an ID (it would take thousands of years even at unrealistically
high allocation rates).

The trickiness of this patch is that, in many unit tests, the BlockCache
singleton ends up persisting across multiple separate block managers.
Even though the test has torn down and recreated a new block manager,
the BlockCache continues to cache entries from the previous block manager.
With the block IDs starting from '1', we would be sure to have a collision
and many tests failed.

The workaround is for the LBM to notice when it is running in a gtest
(by way of some weak symbol magic) and start its allocation at a
random point in block space, rather than starting at 1.

For the FBM, I did a simpler workaround and just always started allocation
at a random point, since the FBM doesn't scan its block list at startup
and therefore would likely suffer collisions at startup even in the
'normal' (non-test) case.

Unfortunately there's no real way to write a regression test for this:
it would only produce itself after inserting tens of terabytes of data
in the presence of lots of remote bootstraps, etc.

Change-Id: Id45bf81bd6bccd51937c358716ace895ccee469c
Reviewed-on: http://gerrit.cloudera.org:8080/3719
Reviewed-by: Adar Dembo <adar@cloudera.com>
Reviewed-by: Dan Burkert <dan@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/88a1cf0d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/88a1cf0d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/88a1cf0d

Branch: refs/heads/master
Commit: 88a1cf0db2d4cb912a65be239492a3d3fceed69c
Parents: ddc2047
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Jul 20 21:09:35 2016 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Sat Jul 23 01:58:13 2016 +0000

----------------------------------------------------------------------
 src/kudu/fs/block_manager-test.cc |  9 +++------
 src/kudu/fs/file_block_manager.cc | 10 +++++++++-
 src/kudu/fs/file_block_manager.h  |  2 ++
 src/kudu/fs/log_block_manager.cc  | 30 ++++++++++++++++++++++++++++--
 src/kudu/fs/log_block_manager.h   |  2 +-
 src/kudu/util/test_util.cc        |  7 +++++++
 6 files changed, 50 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/88a1cf0d/src/kudu/fs/block_manager-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/block_manager-test.cc b/src/kudu/fs/block_manager-test.cc
index 81af5b8..8a0385e 100644
--- a/src/kudu/fs/block_manager-test.cc
+++ b/src/kudu/fs/block_manager-test.cc
@@ -720,10 +720,6 @@ class LogBlockManagerTest : public BlockManagerTest<LogBlockManager>
{
 // reused.
 TEST_F(LogBlockManagerTest, TestReuseBlockIds) {
   RETURN_NOT_LOG_BLOCK_MANAGER();
-
-  // Set a deterministic random seed, so that we can reproduce the sequence
-  // of random numbers.
-  bm_->rand_.Reset(1);
   vector<BlockId> block_ids;
 
   // Create 4 containers, with the first four block IDs in the random sequence.
@@ -753,8 +749,9 @@ TEST_F(LogBlockManagerTest, TestReuseBlockIds) {
   }
 
   // Reset the random seed and re-create new blocks which should reuse the same
-  // block IDs (allowed since the blocks were deleted).
-  bm_->rand_.Reset(1);
+  // block IDs. This isn't allowed in current versions of Kudu, but older versions
+  // could produce this situation, and we still need to handle it on startup.
+  bm_->next_block_id_.Store(1);
   for (int i = 0; i < 4; i++) {
     gscoped_ptr<WritableBlock> writer;
     ASSERT_OK(bm_->CreateBlock(&writer));

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/88a1cf0d/src/kudu/fs/file_block_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/file_block_manager.cc b/src/kudu/fs/file_block_manager.cc
index 226e107..53d0352 100644
--- a/src/kudu/fs/file_block_manager.cc
+++ b/src/kudu/fs/file_block_manager.cc
@@ -510,6 +510,7 @@ FileBlockManager::FileBlockManager(Env* env, const BlockManagerOptions&
opts)
     read_only_(opts.read_only),
     root_paths_(opts.root_paths),
     rand_(GetRandomSeed32()),
+    next_block_id_(rand_.Next64()),
     mem_tracker_(MemTracker::CreateTracker(-1,
                                            "file_block_manager",
                                            opts.parent_mem_tracker)) {
@@ -664,15 +665,22 @@ Status FileBlockManager::CreateBlock(const CreateBlockOptions& opts,
   internal::FileBlockLocation location;
   shared_ptr<WritableFile> writer;
 
+  int attempt_num = 0;
   // Repeat in case of block id collisions (unlikely).
   do {
     created_dirs.clear();
 
+    // If we failed to generate a unique ID, start trying again from a random
+    // part of the key space.
+    if (attempt_num++ > 0) {
+      next_block_id_.Store(rand_.Next64());
+    }
+
     // Make sure we don't accidentally create a location using the magic
     // invalid ID value.
     BlockId id;
     do {
-      id.SetId(rand_.Next64());
+      id.SetId(next_block_id_.Increment());
     } while (id.IsNull());
 
     location = internal::FileBlockLocation::FromParts(

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/88a1cf0d/src/kudu/fs/file_block_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/fs/file_block_manager.h b/src/kudu/fs/file_block_manager.h
index 4d4a90d..65fa2de 100644
--- a/src/kudu/fs/file_block_manager.h
+++ b/src/kudu/fs/file_block_manager.h
@@ -26,6 +26,7 @@
 
 #include "kudu/fs/block_id.h"
 #include "kudu/fs/block_manager.h"
+#include "kudu/util/atomic.h"
 #include "kudu/util/locks.h"
 #include "kudu/util/random.h"
 
@@ -125,6 +126,7 @@ class FileBlockManager : public BlockManager {
 
   // For generating block IDs.
   ThreadSafeRandom rand_;
+  AtomicInt<int64_t> next_block_id_;
 
   // Protects 'dirty_dirs_' and 'next_root_path_'.
   mutable simple_spinlock lock_;

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/88a1cf0d/src/kudu/fs/log_block_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/fs/log_block_manager.cc b/src/kudu/fs/log_block_manager.cc
index 928aca0..7890964 100644
--- a/src/kudu/fs/log_block_manager.cc
+++ b/src/kudu/fs/log_block_manager.cc
@@ -17,6 +17,7 @@
 
 #include "kudu/fs/log_block_manager.h"
 
+#include <algorithm>
 #include <mutex>
 
 #include "kudu/fs/block_manager_metrics.h"
@@ -110,6 +111,9 @@ using std::unordered_set;
 using strings::Substitute;
 
 namespace kudu {
+
+ATTRIBUTE_WEAK bool g_is_gtest;
+
 namespace fs {
 namespace internal {
 
@@ -1089,7 +1093,20 @@ LogBlockManager::LogBlockManager(Env* env, const BlockManagerOptions&
opts)
     read_only_(opts.read_only),
     root_paths_(opts.root_paths),
     root_paths_idx_(0),
-    rand_(GetRandomSeed32()) {
+    next_block_id_(1) {
+
+  // HACK: when running in a test environment, we often instantiate many
+  // LogBlockManagers in the same process, eg corresponding to different
+  // tablet servers in a minicluster, or due to running many separate test
+  // cases of some CFile-related code. In that case, we need to make it more
+  // likely that the block IDs are not reused. So, instead of starting with
+  // block ID 1, we'll start with a random block ID. A collision is still
+  // possible, but exceedingly unlikely.
+  if (g_is_gtest) {
+    Random r(GetRandomSeed32());
+    next_block_id_.Store(r.Next64());
+  }
+
   DCHECK_GT(root_paths_.size(), 0);
   if (opts.metric_entity) {
     metrics_.reset(new internal::LogBlockManagerMetrics(opts.metric_entity));
@@ -1322,9 +1339,11 @@ Status LogBlockManager::CreateBlock(const CreateBlockOptions& opts,
   }
 
   // Generate a free block ID.
+  // We have to loop here because earlier versions used non-sequential block IDs,
+  // and thus we may have to "skip over" some block IDs that are claimed.
   BlockId new_block_id;
   do {
-    new_block_id.SetId(rand_.Next64());
+    new_block_id.SetId(next_block_id_.Increment());
   } while (!TryUseBlockId(new_block_id));
 
   block->reset(new internal::LogWritableBlock(container,
@@ -1637,10 +1656,17 @@ void LogBlockManager::OpenRootPath(const string& root_path,
     // the second container, we'd think there was a duplicate block. Building
     // the container-local map first ensures that we discount deleted blocks
     // before checking for duplicate IDs.
+    //
+    // NOTE: Since KUDU-1538, we allocate sequential block IDs, which makes reuse
+    // exceedingly unlikely. However, we might have old data which still exhibits
+    // the above issue.
     UntrackedBlockMap blocks_in_container;
+    uint64_t max_block_id = 0;
     for (const BlockRecordPB& r : records) {
       ProcessBlockRecord(r, container.get(), &blocks_in_container);
+      max_block_id = std::max(max_block_id, r.block_id().id());
     }
+    next_block_id_.StoreMax(max_block_id + 1);
 
     // Under the lock, merge this map into the main block map and add
     // the container.

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/88a1cf0d/src/kudu/fs/log_block_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/fs/log_block_manager.h b/src/kudu/fs/log_block_manager.h
index 01672ec..ec8306a 100644
--- a/src/kudu/fs/log_block_manager.h
+++ b/src/kudu/fs/log_block_manager.h
@@ -373,7 +373,7 @@ class LogBlockManager : public BlockManager {
   ObjectIdGenerator oid_generator_;
 
   // For generating block IDs.
-  ThreadSafeRandom rand_;
+  AtomicInt<int64_t> next_block_id_;
 
   // Metrics for the block manager.
   //

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/88a1cf0d/src/kudu/util/test_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/util/test_util.cc b/src/kudu/util/test_util.cc
index 51746ee..46f4714 100644
--- a/src/kudu/util/test_util.cc
+++ b/src/kudu/util/test_util.cc
@@ -43,6 +43,13 @@ static const char* const kSlowTestsEnvVariable = "KUDU_ALLOW_SLOW_TESTS";
 
 static const uint64 kTestBeganAtMicros = Env::Default()->NowMicros();
 
+// Global which production code can check to see if it is running
+// in a GTest environment (assuming the test binary links in this module,
+// which is typically a good assumption). This can be imported
+// as a weak symbol such that, if it's not linked in, the reader sees
+// 'false'.
+bool g_is_gtest = true;
+
 ///////////////////////////////////////////////////
 // KuduTest
 ///////////////////////////////////////////////////


Mime
View raw message