kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [1/2] incubator-kudu git commit: KUDU-815. Improve performance of first scan following restart
Date Fri, 06 May 2016 21:50:42 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master a48642213 -> db235c165

KUDU-815. Improve performance of first scan following restart

On the first scan following a tablet server restart, the TS has not read
deltafile stats for any delta files. This means that, when we construct
DeltaFileIterators to service a scan, we don't yet know whether the files
are even relevant given the MVCC snapshot that is being scanned.

Previous to this patch, we only attempted to cull irrelevant DeltaFiles
at iterator construction time, and without stats, we were unable to do so.
With this patch, we check again when the iterator is seeked, and in the
case that the file is irrelevant, we preemptively mark the file as
"exhausted" which prevents any needless IO.

To benchmark, I loaded a 1GB TPCH lineitem on a local tserver and looked
at the performance of the first scan.

without patch:
todd@todd-ThinkPad-T540p:~/git/kudu$ ./build/release/bin/tpch_real_world  --tpch_load_data=0
I0505 16:15:28.855382 32209 tpch_real_world.cc:307] Time spent querying data in cluster: real
1.966s    user 0.112s     sys 0.000s
I0505 16:15:29.598799 32209 tpch_real_world.cc:307] Time spent querying data in cluster: real
0.743s    user 0.100s     sys 0.000s

with patch:
todd@todd-ThinkPad-T540p:~/git/kudu$ ./build/release/bin/tpch_real_world  --tpch_load_data=0
I0505 16:14:31.102988 31545 tpch_real_world.cc:307] Time spent querying data in cluster: real
0.924s    user 0.096s     sys 0.008s

There is still a slight performance difference between the first scan after a
restart and the second due to cold caches, but the difference is much less

Change-Id: Icd01302723430e5b06308256bbbbb790aee096fc
Reviewed-on: http://gerrit.cloudera.org:8080/2974
Tested-by: Kudu Jenkins
Reviewed-by: Jean-Daniel Cryans

Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/9c334d20
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/9c334d20
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/9c334d20

Branch: refs/heads/master
Commit: 9c334d2021dfd297ab677a403cc8a07099cc475b
Parents: a486422
Author: Todd Lipcon <todd@apache.org>
Authored: Thu May 5 16:14:51 2016 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Fri May 6 21:45:38 2016 +0000

 src/kudu/tablet/deltafile.cc | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/kudu/tablet/deltafile.cc b/src/kudu/tablet/deltafile.cc
index 1b3cfef..d630f85 100644
--- a/src/kudu/tablet/deltafile.cc
+++ b/src/kudu/tablet/deltafile.cc
@@ -363,6 +363,17 @@ Status DeltaFileIterator::SeekToOrdinal(rowid_t idx) {
   // Finish the initialization of any lazily-initialized state.
+  // Check again whether this delta file is relevant given the snapshot
+  // that we are querying. We did this already before creating the
+  // DeltaFileIterator, but due to lazy initialization, it's possible
+  // that we weren't able to check at that time.
+  if (!dfr_->IsRelevantForSnapshot(mvcc_snap_)) {
+    exhausted_ = true;
+    delta_blocks_.clear();
+    return Status::OK();
+  }
   if (!index_iter_) {
@@ -450,7 +461,7 @@ string DeltaFileIterator::PreparedDeltaBlock::ToString() const {
 Status DeltaFileIterator::PrepareBatch(size_t nrows, PrepareFlag flag) {
   DCHECK(initted_) << "Must call Init()";
-  DCHECK(index_iter_) << "Must call SeekToOrdinal()";
+  DCHECK(exhausted_ || index_iter_) << "Must call SeekToOrdinal()";
   CHECK_GT(nrows, 0);

View raw message