kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danburk...@apache.org
Subject kudu git commit: Add 'kudu fs list' tool
Date Tue, 09 Jan 2018 20:17:25 GMT
Repository: kudu
Updated Branches:
  refs/heads/branch-1.6.x d6d2f7bf7 -> 94136dbbd


Add 'kudu fs list' tool

This tool aims to replace exploratory usages of 'kudu fs dump' and
'kudu local_replica dump' with an improved, unified tool. 'kudu fs list' is
more flexible, easier to use, and can show more information.

Output is formatted using the DataTable abstraction, which gives it
good default pretty-printing, with options to output in CSV and JSON for
scripts. Results can easily be filtered to a specific table, tablet, column,
rowset, or block using flags.

The tool can output many different fields: table, table-id, tablet-id,
partition, rowset-id, block-id, block-kind, column, column-id,
cfile-data-type, cfile-encoding, cfile-compression, cfile-num-values,
cfile-size cfile-incompatible-features, cfile-compatible-features,
cfile-min-key, cfile-max-key, and cfile-delta-stats. More fields should
be straightforward to add.

The tool transparently joins information from tablet superblocks with
CFile footers, only materializing the metadata necessary to satisfy the
requested fields and filters.

Examples:

To get our bearings, let's look at what tablets are stored on a local
tablet server:

```bash
$ kudu fs list --fs-wal-dir=/data/kudu/tserver \
    --columns="table, table-id, tablet-id, partition"

                     table                     |             table-id             |      
     tablet-id             |                        partition
-----------------------------------------------+----------------------------------+----------------------------------+---------------------------------------------------------
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | 2a631714f2d243ff92bf525630baa1ec
| HASH (key) PARTITION 7, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | 36827286a00049bc8b242243c6728157
| HASH (key) PARTITION 0, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | 3880b30ccebd4ede867febd9c7d5580f
| HASH (key) PARTITION 0, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | 39436e9e17d84884b1cb689e88b8415f
| HASH (key) PARTITION 5, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | 44252efb9aaa4c2c963cf6dd5e875c04
| HASH (key) PARTITION 3, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | 57c92ed8391b4d2bbfdeb339f9fb59fd
| HASH (key) PARTITION 2, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | 68a64aba5917499ebb7773f16bcd6f6d
| HASH (key) PARTITION 7, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | 6b5f0729a9bf454791239f77b0912f4e
| HASH (key) PARTITION 1, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | 8a2d120bd6984144ae963bfe8435206e
| HASH (key) PARTITION 4, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | 8b3ba4f415f945849a6a690a142cf1e4
| HASH (key) PARTITION 5, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | 9656be3aa07248a69e3ad6edaa0048cb
| HASH (key) PARTITION 1, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | 9e8e444d079842a9b4a83ee9f8bed633
| HASH (key) PARTITION 6, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | a794a8e5d3f24e70a96b0beb5a355823
| HASH (key) PARTITION 3, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | bfb8f24b91cd4ecf924aacbb37125041
| HASH (key) PARTITION 2, RANGE (key) PARTITION UNBOUNDED
 foo                                           | e184a99893b44b17a7b2131123c6de0e | c3ce418c72ab4fea8548387f236dd1fa
|
 loadgen_auto_fa03aaf7bdf54bb4896c534f38d177a1 | 800af247c1424ecd8e96a37b5ee4d311 | e00a284081ca468a994a3609a511e886
| HASH (key) PARTITION 4, RANGE (key) PARTITION UNBOUNDED
 loadgen_auto_06c8038c02da40048397e4f6ad1662c3 | 84ff589b979e4f90aa630e7179fcb644 | efa22fc899a44bb2a16f620464a15c60
| HASH (key) PARTITION 6, RANGE (key) PARTITION UNBOUNDED
```

The 'foo' table looks interesting; let's drill down into its tablet, and
see what rowsets and blocks it has, and some of their associated metadata:

```bash
$ kudu fs list --fs-wal-dir=/data/kudu/tserver \
    --columns="rowset-id, column, column-id, block-kind, block-id" \
    --tablet-id=c3ce418c72ab4fea8548387f236dd1fa

 rowset-id | column | column-id | block-kind  |    block-id
-----------+--------+-----------+-------------+----------------
 0         | k1     | 10        | column      | 90680632611552
 0         | k2     | 11        | column      | 90680632611553
 0         | k3     | 12        | column      | 90680632611554
 0         | k4     | 13        | column      | 90680632611555
 0         | v1     | 14        | column      | 90680632611556
 0         | v2     | 15        | column      | 90680632611557
 0         | v3     | 16        | column      | 90680632611558
 0         | v4     | 17        | column      | 90680632611559
 0         |        |           | bloom       | 90680632611560
 0         |        |           | adhoc-index | 90680632611561
 1         | k1     | 10        | column      | 90680632611564
 1         | k2     | 11        | column      | 90680632611565
 1         | k3     | 12        | column      | 90680632611566
 1         | k4     | 13        | column      | 90680632611567
 1         | v1     | 14        | column      | 90680632611568
 1         | v2     | 15        | column      | 90680632611569
 1         | v3     | 16        | column      | 90680632611570
 1         | v4     | 17        | column      | 90680632611571
 1         |        |           | bloom       | 90680632611572
 1         |        |           | adhoc-index | 90680632611573
```

We can immediately see that this tablet has two rowsets, each of which
has 8 column blocks, a bloom block, and an ad-hoc index block. Lets
drill down futher and inspect the 'v4' column:

```bash
$ kudu fs list --fs-wal-dir=<> \
    --columns="block-id, cfile-data-type, cfile-encoding, cfile-compression, cfile-num-values,
cfile-size" \
    --tablet-id=c3ce418c72ab4fea8548387f236dd1fa \
    --column-id=17

    block-id    | cfile-data-type | cfile-encoding | cfile-compression | cfile-num-values
| cfile-size
----------------+-----------------+----------------+-------------------+------------------+------------
 90680632611555 | int64           | BIT_SHUFFLE    | NO_COMPRESSION    | 5.09M           
| 782.6K
 90680632611567 | int64           | BIT_SHUFFLE    | NO_COMPRESSION    | 5.40M           
| 830.1K
```

And we can immediately see the CFile's on-disk encoding and compression,
the number of cells, and the CFile/block size.

Change-Id: I7f5a63e636d95e3ee55bb4955cece7f5d0b7532d
Reviewed-on: http://gerrit.cloudera.org:8080/8911
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Kudu Jenkins
(cherry picked from commit 36995e26022d9e54692e04e3428c5993b523a733)
Reviewed-on: http://gerrit.cloudera.org:8080/8967
Reviewed-by: Todd Lipcon <todd@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/94136dbb
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/94136dbb
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/94136dbb

Branch: refs/heads/branch-1.6.x
Commit: 94136dbbd3cf63b65e7c65a0588940a2775e959b
Parents: d6d2f7b
Author: Dan Burkert <danburkert@apache.org>
Authored: Thu Dec 21 11:24:17 2017 -0800
Committer: Dan Burkert <danburkert@apache.org>
Committed: Tue Jan 9 20:17:02 2018 +0000

----------------------------------------------------------------------
 src/kudu/cfile/cfile_reader.cc        |   2 +-
 src/kudu/cfile/cfile_reader.h         |   2 +-
 src/kudu/gutil/strings/join.h         |  15 +-
 src/kudu/tablet/rowset_metadata.h     |   2 +-
 src/kudu/tools/kudu-tool-test.cc      |  71 +++-
 src/kudu/tools/tool_action.cc         |   3 +-
 src/kudu/tools/tool_action_fs.cc      | 499 ++++++++++++++++++++++++++++-
 src/kudu/tools/tool_action_tserver.cc |  16 +-
 8 files changed, 587 insertions(+), 23 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/cfile/cfile_reader.cc
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_reader.cc b/src/kudu/cfile/cfile_reader.cc
index 6f19487..91b3cb5 100644
--- a/src/kudu/cfile/cfile_reader.cc
+++ b/src/kudu/cfile/cfile_reader.cc
@@ -555,7 +555,7 @@ Status CFileReader::CountRows(rowid_t *count) const {
   return Status::OK();
 }
 
-bool CFileReader::GetMetadataEntry(const string &key, string *val) {
+bool CFileReader::GetMetadataEntry(const string &key, string *val) const {
   for (const FileMetadataPairPB &pair : header().metadata()) {
     if (pair.key() == key) {
       *val = pair.value();

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/cfile/cfile_reader.h
----------------------------------------------------------------------
diff --git a/src/kudu/cfile/cfile_reader.h b/src/kudu/cfile/cfile_reader.h
index 42d7b0b..58fb60f 100644
--- a/src/kudu/cfile/cfile_reader.h
+++ b/src/kudu/cfile/cfile_reader.h
@@ -119,7 +119,7 @@ class CFileReader {
   //
   // Note that this implementation is currently O(n), so should not be used
   // in a hot path.
-  bool GetMetadataEntry(const std::string &key, std::string *val);
+  bool GetMetadataEntry(const std::string &key, std::string *val) const;
 
   // Can be called before Init().
   uint64_t file_size() const {

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/gutil/strings/join.h
----------------------------------------------------------------------
diff --git a/src/kudu/gutil/strings/join.h b/src/kudu/gutil/strings/join.h
index 104ab90..c7c5c85 100644
--- a/src/kudu/gutil/strings/join.h
+++ b/src/kudu/gutil/strings/join.h
@@ -180,16 +180,17 @@ inline std::string JoinStrings(const CONTAINER& components,
 // 'components'.
 template<class CONTAINER, typename FUNC>
 std::string JoinMapped(const CONTAINER& components,
-                  const FUNC& functor,
-                  const StringPiece& delim) {
+                       const FUNC& functor,
+                       const StringPiece& delim) {
   std::string result;
-  for (typename CONTAINER::const_iterator iter = components.begin();
-      iter != components.end();
-      iter++) {
-    if (iter != components.begin()) {
+  bool append_delim = false;
+  for (const auto& component : components) {
+    if (append_delim) {
       result.append(delim.data(), delim.size());
+    } else {
+      append_delim = true;
     }
-    result.append(functor(*iter));
+    result.append(functor(component));
   }
   return result;
 }

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/tablet/rowset_metadata.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/rowset_metadata.h b/src/kudu/tablet/rowset_metadata.h
index fa98897..dfc1ed2 100644
--- a/src/kudu/tablet/rowset_metadata.h
+++ b/src/kudu/tablet/rowset_metadata.h
@@ -134,7 +134,7 @@ class RowSetMetadata {
     return !adhoc_index_block_.IsNull();
   }
 
-  BlockId column_data_block_for_col_id(ColumnId col_id) {
+  BlockId column_data_block_for_col_id(const ColumnId& col_id) const {
     std::lock_guard<LockType> l(lock_);
     return FindOrDie(blocks_by_col_id_, col_id);
   }

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/tools/kudu-tool-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/kudu-tool-test.cc b/src/kudu/tools/kudu-tool-test.cc
index 503737a..0f1a939 100644
--- a/src/kudu/tools/kudu-tool-test.cc
+++ b/src/kudu/tools/kudu-tool-test.cc
@@ -63,7 +63,6 @@
 #include "kudu/fs/block_manager.h"
 #include "kudu/fs/fs_manager.h"
 #include "kudu/fs/fs_report.h"
-#include "kudu/gutil/gscoped_ptr.h"
 #include "kudu/gutil/port.h"
 #include "kudu/gutil/ref_counted.h"
 #include "kudu/gutil/stl_util.h"
@@ -406,7 +405,8 @@ TEST_F(ToolTest, TestModeHelp) {
         "check.*Kudu filesystem for inconsistencies",
         "dump.*Dump a Kudu filesystem",
         "format.*new Kudu filesystem",
-        "update_dirs.*Updates the set of data directories"
+        "list.*List metadata for on-disk tablets, rowsets, blocks",
+        "update_dirs.*Updates the set of data directories",
     };
     NO_FATALS(RunTestHelp("fs", kFsModeRegexes));
     NO_FATALS(RunTestHelp("fs not_a_mode", kFsModeRegexes,
@@ -1245,6 +1245,73 @@ TEST_F(ToolTest, TestLocalReplicaOps) {
     SCOPED_TRACE(stdout);
     ASSERT_STR_MATCHES(stdout, kTestTablet);
   }
+
+  // Test 'kudu fs list' tablet group.
+  {
+    string stdout;
+    NO_FATALS(RunActionStdoutString(
+          Substitute("fs list $0 --columns=table,tablet-id --format=csv",
+                     fs_paths),
+          &stdout));
+
+    SCOPED_TRACE(stdout);
+    EXPECT_EQ(stdout, "KuduTableTest,test-tablet");
+  }
+
+  // Test 'kudu fs list' rowset group.
+  {
+    string stdout;
+    NO_FATALS(RunActionStdoutString(
+          Substitute("fs list $0 --columns=table,tablet-id,rowset-id --format=csv",
+                     fs_paths),
+          &stdout));
+
+    SCOPED_TRACE(stdout);
+    EXPECT_EQ(stdout, "KuduTableTest,test-tablet,0");
+  }
+  // Test 'kudu fs list' block group.
+  {
+    vector<string> stdout;
+    NO_FATALS(RunActionStdoutLines(
+          Substitute("fs list $0 "
+                     "--columns=table,tablet-id,rowset-id,block-kind,column "
+                     "--format=csv",
+                     fs_paths),
+          &stdout));
+
+    SCOPED_TRACE(stdout);
+    ASSERT_EQ(5, stdout.size());
+    EXPECT_EQ(stdout[0], Substitute("KuduTableTest,$0,0,column,key", kTestTablet));
+    EXPECT_EQ(stdout[1], Substitute("KuduTableTest,$0,0,column,int_val", kTestTablet));
+    EXPECT_EQ(stdout[2], Substitute("KuduTableTest,$0,0,column,string_val", kTestTablet));
+    EXPECT_EQ(stdout[3], Substitute("KuduTableTest,$0,0,undo,", kTestTablet));
+    EXPECT_EQ(stdout[4], Substitute("KuduTableTest,$0,0,bloom,", kTestTablet));
+  }
+
+  // Test 'kudu fs list' cfile group.
+  {
+    vector<string> stdout;
+    NO_FATALS(RunActionStdoutLines(
+          Substitute("fs list $0 "
+                     "--columns=table,tablet-id,rowset-id,block-kind,"
+                               "column,cfile-encoding,cfile-num-values "
+                     "--format=csv",
+                     fs_paths),
+          &stdout));
+
+    SCOPED_TRACE(stdout);
+    ASSERT_EQ(5, stdout.size());
+    EXPECT_EQ(stdout[0],
+              Substitute("KuduTableTest,$0,0,column,key,BIT_SHUFFLE,10", kTestTablet));
+    EXPECT_EQ(stdout[1],
+              Substitute("KuduTableTest,$0,0,column,int_val,BIT_SHUFFLE,10", kTestTablet));
+    EXPECT_EQ(stdout[2],
+              Substitute("KuduTableTest,$0,0,column,string_val,DICT_ENCODING,10", kTestTablet));
+    EXPECT_EQ(stdout[3],
+              Substitute("KuduTableTest,$0,0,undo,,PLAIN_ENCODING,10", kTestTablet));
+    EXPECT_EQ(stdout[4],
+              Substitute("KuduTableTest,$0,0,bloom,,PLAIN_ENCODING,0", kTestTablet));
+  }
 }
 
 // Create and start Kudu mini cluster, optionally creating a table in the DB,

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/tools/tool_action.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/tool_action.cc b/src/kudu/tools/tool_action.cc
index 80af0ce..307c8ce 100644
--- a/src/kudu/tools/tool_action.cc
+++ b/src/kudu/tools/tool_action.cc
@@ -19,6 +19,7 @@
 
 #include <algorithm>
 #include <memory>
+#include <ostream>
 #include <string>
 #include <unordered_map>
 #include <utility>
@@ -231,7 +232,7 @@ ActionBuilder& ActionBuilder::AddOptionalParameter(string param,
 #ifndef NDEBUG
   // Make sure this gflag exists.
   string option;
-  DCHECK(google::GetCommandLineOption(param.c_str(), &option));
+  DCHECK(google::GetCommandLineOption(param.c_str(), &option)) << "unknown option:
" << param;
 #endif
   args_.optional.emplace_back(ActionArgsDescriptor::Flag({ std::move(param),
                                                            std::move(default_value),

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/tools/tool_action_fs.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/tool_action_fs.cc b/src/kudu/tools/tool_action_fs.cc
index 5584588..352a47d 100644
--- a/src/kudu/tools/tool_action_fs.cc
+++ b/src/kudu/tools/tool_action_fs.cc
@@ -24,8 +24,11 @@
 #include <memory>
 #include <string>
 #include <unordered_map>
+#include <utility>
 #include <vector>
 
+#include <boost/container/flat_map.hpp>
+#include <boost/container/vector.hpp>
 #include <boost/optional/optional.hpp>
 #include <gflags/gflags.h>
 #include <gflags/gflags_declare.h>
@@ -34,6 +37,12 @@
 #include "kudu/cfile/cfile.pb.h"
 #include "kudu/cfile/cfile_reader.h"
 #include "kudu/cfile/cfile_util.h"
+#include "kudu/cfile/type_encodings.h"
+#include "kudu/common/common.pb.h"
+#include "kudu/common/encoded_key.h"
+#include "kudu/common/partition.h"
+#include "kudu/common/schema.h"
+#include "kudu/common/types.h"
 #include "kudu/fs/block_id.h"
 #include "kudu/fs/block_manager.h"
 #include "kudu/fs/fs_manager.h"
@@ -41,16 +50,32 @@
 #include "kudu/gutil/gscoped_ptr.h"
 #include "kudu/gutil/map-util.h"
 #include "kudu/gutil/ref_counted.h"
+#include "kudu/gutil/strings/ascii_ctype.h"
+#include "kudu/gutil/strings/human_readable.h"
+#include "kudu/gutil/strings/join.h"
 #include "kudu/gutil/strings/numbers.h"
+#include "kudu/gutil/strings/split.h"
+#include "kudu/gutil/strings/stringpiece.h"
+#include "kudu/gutil/strings/strip.h"
 #include "kudu/gutil/strings/substitute.h"
+#include "kudu/tablet/delta_stats.h"
+#include "kudu/tablet/deltafile.h"
+#include "kudu/tablet/diskrowset.h"
+#include "kudu/tablet/rowset_metadata.h"
+#include "kudu/tablet/tablet.pb.h"
 #include "kudu/tablet/tablet_metadata.h"
+#include "kudu/tools/tool_action_common.h"
+#include "kudu/util/compression/compression.pb.h"
 #include "kudu/util/env.h"
 #include "kudu/util/faststring.h"
+#include "kudu/util/memory/arena.h"
 #include "kudu/util/pb_util.h"
 #include "kudu/util/slice.h"
 #include "kudu/util/status.h"
 
 DECLARE_bool(print_meta);
+DECLARE_string(columns);
+
 DEFINE_bool(print_rows, true,
             "Print each row in the CFile");
 DEFINE_string(uuid, "",
@@ -59,23 +84,37 @@ DEFINE_string(uuid, "",
 DEFINE_bool(repair, false,
             "Repair any inconsistencies in the filesystem.");
 
+DEFINE_string(table_id, "",
+              "Restrict output to a specific table");
+DEFINE_string(tablet_id, "",
+              "Restrict output to a specific tablet");
+DEFINE_int64(rowset_id, -1,
+             "Restrict output to a specific rowset");
+DEFINE_int32(column_id, -1,
+             "Restrict output to a specific column");
+DEFINE_uint64(block_id, 0,
+              "Restrict output to a specific block");
+DEFINE_bool(h, true,
+            "Pretty-print values in human-readable units");
+
 namespace kudu {
 namespace tools {
 
-using cfile::CFileReader;
 using cfile::CFileIterator;
+using cfile::CFileReader;
 using cfile::ReaderOptions;
 using fs::BlockDeletionTransaction;
 using fs::FsReport;
 using fs::ReadableBlock;
 using std::cout;
 using std::endl;
-using std::string;
 using std::shared_ptr;
+using std::string;
 using std::unique_ptr;
 using std::unordered_map;
 using std::vector;
 using strings::Substitute;
+using tablet::RowSetMetadata;
 using tablet::TabletMetadata;
 
 namespace {
@@ -283,6 +322,434 @@ Status Update(const RunnerContext& /*context*/) {
   return fs.Open();
 }
 
+namespace {
+
+// The 'kudu fs list' column fields.
+//
+// Field is synonymous with a data-table column, but internally we use 'field'
+// in order to disambiguate with Kudu columns.
+enum class Field {
+
+  // Tablet-specific information:
+  kTable,
+  kTableId,
+  kTabletId,
+  kPartition,
+
+  // Rowset-specific information:
+  kRowsetId,
+
+  // Block-specific information:
+  kBlockId,
+  kBlockKind,
+  kColumn,
+  kColumnId,
+
+  // CFile specific information:
+  kCFileDataType,
+  kCFileNullable,
+  kCFileEncoding,
+  kCFileCompression,
+  kCFileNumValues,
+  kCFileSize,
+  kCFileMinKey,
+  kCFileMaxKey,
+  kCFileIncompatibleFeatures,
+  kCFileCompatibleFeatures,
+  kCFileDeltaStats,
+};
+
+// Enumerable array of field variants. Must be kept in-sync with the Field enum class.
+const Field kFieldVariants[] = {
+  Field::kTable,
+  Field::kTableId,
+  Field::kTabletId,
+  Field::kPartition,
+  Field::kRowsetId,
+  Field::kBlockId,
+  Field::kBlockKind,
+  Field::kColumn,
+  Field::kColumnId,
+  Field::kCFileDataType,
+  Field::kCFileNullable,
+  Field::kCFileEncoding,
+  Field::kCFileCompression,
+  Field::kCFileNumValues,
+  Field::kCFileSize,
+  Field::kCFileIncompatibleFeatures,
+  Field::kCFileCompatibleFeatures,
+  Field::kCFileMinKey,
+  Field::kCFileMaxKey,
+  Field::kCFileDeltaStats,
+};
+
+// Groups the fields into categories based on their cardinality and required metadata.
+enum class FieldGroup {
+  // Cardinality: 1 row per tablet
+  // Metadata: TabletMetadata
+  kTablet,
+
+  // Cardinality: 1 row per rowset per tablet
+  // Metadata: RowSetMetadata, TabletMetadata
+  kRowset,
+
+  // Cardinality: 1 row per block per rowset per tablet
+  // Metadata: RowSetMetadata, TabletMetadata
+  kBlock,
+
+  // Cardinality: 1 row per block per rowset per tablet
+  // Metadata: CFileReader, RowSetMetadata, TabletMetadata
+  kCFile,
+};
+
+// Returns the pretty-printed field name.
+const char* ToString(Field field) {
+  switch (field) {
+    case Field::kTable: return "table";
+    case Field::kTableId: return "table-id";
+    case Field::kTabletId: return "tablet-id";
+    case Field::kPartition: return "partition";
+    case Field::kRowsetId: return "rowset-id";
+    case Field::kBlockId: return "block-id";
+    case Field::kBlockKind: return "block-kind";
+    case Field::kColumn: return "column";
+    case Field::kColumnId: return "column-id";
+    case Field::kCFileDataType: return "cfile-data-type";
+    case Field::kCFileNullable: return "cfile-nullable";
+    case Field::kCFileEncoding: return "cfile-encoding";
+    case Field::kCFileCompression: return "cfile-compression";
+    case Field::kCFileNumValues: return "cfile-num-values";
+    case Field::kCFileSize: return "cfile-size";
+    case Field::kCFileIncompatibleFeatures: return "cfile-incompatible-features";
+    case Field::kCFileCompatibleFeatures: return "cfile-compatible-features";
+    case Field::kCFileMinKey: return "cfile-min-key";
+    case Field::kCFileMaxKey: return "cfile-max-key";
+    case Field::kCFileDeltaStats: return "cfile-delta-stats";
+  }
+  LOG(FATAL) << "unhandled field (this is a bug)";
+}
+
+// Returns the pretty-printed group name.
+const char* ToString(FieldGroup group) {
+  switch (group) {
+    case FieldGroup::kTablet: return "tablet";
+    case FieldGroup::kRowset: return "rowset";
+    case FieldGroup::kBlock: return "block";
+    case FieldGroup::kCFile: return "cfile";
+    default: LOG(FATAL) << "unhandled field group (this is a bug)";
+  }
+}
+
+// Transforms an ASCII string to lowercase.
+void ToLowerCase(string* string) {
+  std::transform(string->begin(), string->end(), string->begin(), ascii_tolower);
+}
+
+// Parses a field name and returns the corresponding enum variant.
+Status ParseField(string name, Field* field) {
+  StripWhiteSpace(&name);
+  StripString(&name, "_", '-');
+  ToLowerCase(&name);
+
+  for (Field variant : kFieldVariants) {
+    if (name == ToString(variant)) {
+      *field = variant;
+      return Status::OK();
+    }
+  }
+
+  return Status::InvalidArgument("unknown column", name);
+}
+
+FieldGroup ToFieldGroup(Field field) {
+  switch (field) {
+    case Field::kTable:
+    case Field::kTableId:
+    case Field::kTabletId:
+    case Field::kPartition: return FieldGroup::kTablet;
+
+    case Field::kRowsetId: return FieldGroup::kRowset;
+
+    case Field::kBlockId:
+    case Field::kBlockKind:
+    case Field::kColumn:
+    case Field::kColumnId: return FieldGroup::kBlock;
+
+    case Field::kCFileDataType:
+    case Field::kCFileNullable:
+    case Field::kCFileEncoding:
+    case Field::kCFileCompression:
+    case Field::kCFileNumValues:
+    case Field::kCFileSize:
+    case Field::kCFileIncompatibleFeatures:
+    case Field::kCFileCompatibleFeatures:
+    case Field::kCFileMinKey:
+    case Field::kCFileMaxKey:
+    case Field::kCFileDeltaStats: return FieldGroup::kCFile;
+  }
+  LOG(FATAL) << "unhandled field (this is a bug): " << ToString(field);
+}
+
+// Returns tablet info for the field.
+string TabletInfo(Field field, const TabletMetadata& tablet) {
+  switch (field) {
+    case Field::kTable: return tablet.table_name();
+    case Field::kTableId: return tablet.table_id();
+    case Field::kTabletId: return tablet.tablet_id();
+    case Field::kPartition: return tablet.partition_schema()
+                                         .PartitionDebugString(tablet.partition(),
+                                                               tablet.schema());
+    default: LOG(FATAL) << "unhandled field (this is a bug): " << ToString(field);
+  }
+}
+
+// Returns rowset info for the field.
+string RowsetInfo(Field field, const TabletMetadata& tablet, const RowSetMetadata&
rowset) {
+  switch (field) {
+    case Field::kRowsetId: return std::to_string(rowset.id());
+    default: return TabletInfo(field, tablet);
+  }
+}
+
+// Returns block info for the field.
+string BlockInfo(Field field,
+                 const TabletMetadata& tablet,
+                 const RowSetMetadata& rowset,
+                 const char* block_kind,
+                 boost::optional<ColumnId> column_id,
+                 const BlockId& block) {
+  CHECK(!block.IsNull());
+  switch (field) {
+    case Field::kBlockId: return std::to_string(block.id());
+    case Field::kBlockKind: return block_kind;
+
+    case Field::kColumn: if (column_id) {
+      return tablet.schema().column_by_id(*column_id).name();
+    } else { return ""; }
+
+    case Field::kColumnId: if (column_id) {
+      return std::to_string(column_id.get());
+    } else { return ""; }
+
+    default: return RowsetInfo(field, tablet, rowset);
+  }
+}
+
+// Formats the min or max primary key property from CFile metadata.
+string FormatCFileKeyMetadata(const TabletMetadata& tablet,
+                              const CFileReader& cfile,
+                              const char* property) {
+  string value;
+  if (!cfile.GetMetadataEntry(property, &value)) {
+    return "";
+  }
+
+  Arena arena(1024);
+  gscoped_ptr<EncodedKey> key;
+  CHECK_OK(EncodedKey::DecodeEncodedString(tablet.schema(), &arena, value, &key));
+  return key->Stringify(tablet.schema());
+}
+
+// Formats the delta stats property from CFile metadata.
+string FormatCFileDeltaStats(const CFileReader& cfile) {
+  string value;
+  if (!cfile.GetMetadataEntry(tablet::DeltaFileReader::kDeltaStatsEntryName, &value))
{
+    return "";
+  }
+
+  tablet::DeltaStatsPB deltastats_pb;
+  CHECK(deltastats_pb.ParseFromString(value))
+      << "failed to decode delta stats for block " << cfile.block_id();
+
+  tablet::DeltaStats deltastats;
+  CHECK_OK(deltastats.InitFromPB(deltastats_pb));
+  return deltastats.ToString();
+}
+
+// Returns cfile info for the field.
+string CFileInfo(Field field,
+                 const TabletMetadata& tablet,
+                 const RowSetMetadata& rowset,
+                 const char* block_kind,
+                 const boost::optional<ColumnId>& column_id,
+                 const BlockId& block,
+                 const CFileReader& cfile) {
+  switch (field) {
+    case Field::kCFileDataType:
+      return cfile.type_info()->name();
+    case Field::kCFileNullable:
+      return cfile.is_nullable() ? "true" : "false";
+    case Field::kCFileEncoding:
+      return EncodingType_Name(cfile.type_encoding_info()->encoding_type());
+    case Field::kCFileCompression:
+      return CompressionType_Name(cfile.footer().compression());
+    case Field::kCFileNumValues: if (FLAGS_h) {
+      return HumanReadableNum::ToString(cfile.footer().num_values());
+    } else {
+      return std::to_string(cfile.footer().num_values());
+    }
+    case Field::kCFileSize: if (FLAGS_h) {
+      return HumanReadableNumBytes::ToString(cfile.file_size());
+    } else {
+      return std::to_string(cfile.file_size());
+    }
+    case Field::kCFileIncompatibleFeatures:
+      return std::to_string(cfile.footer().incompatible_features());
+    case Field::kCFileCompatibleFeatures:
+      return std::to_string(cfile.footer().compatible_features());
+    case Field::kCFileMinKey:
+      return FormatCFileKeyMetadata(tablet, cfile, tablet::DiskRowSet::kMinKeyMetaEntryName);
+    case Field::kCFileMaxKey:
+      return FormatCFileKeyMetadata(tablet, cfile, tablet::DiskRowSet::kMaxKeyMetaEntryName);
+    case Field::kCFileDeltaStats:
+      return FormatCFileDeltaStats(cfile);
+    default: return BlockInfo(field, tablet, rowset, block_kind, column_id, block);
+  }
+}
+
+// Helper function that calls one of the above info functions repeatedly to
+// build up a row.
+template<typename F, typename... Params>
+vector<string> BuildInfoRow(F info_func,
+                            const vector<Field>& fields,
+                            const Params&... params) {
+  vector<string> row;
+  row.reserve(fields.size());
+  for (Field field : fields) {
+    row.emplace_back(info_func(field, params...));
+  }
+  return row;
+}
+
+// Helper function that opens a CFile, if necessary, builds up a row, and adds
+// it to the data table.
+//
+// If the block ID isn't valid or doesn't match the block ID filter, then the
+// block is skipped.
+Status AddBlockInfoRow(DataTable* table,
+                       FieldGroup group,
+                       const vector<Field>& fields,
+                       FsManager* fs_manager,
+                       const TabletMetadata& tablet,
+                       const RowSetMetadata& rowset,
+                       const char* block_kind,
+                       const boost::optional<ColumnId>& column_id,
+                       const BlockId& block) {
+  if (block.IsNull() || (FLAGS_block_id > 0 && FLAGS_block_id != block.id()))
{
+    return Status::OK();
+  }
+  if (group == FieldGroup::kCFile) {
+    unique_ptr<CFileReader> cfile;
+    unique_ptr<ReadableBlock> readable_block;
+    RETURN_NOT_OK(fs_manager->OpenBlock(block, &readable_block));
+    RETURN_NOT_OK(CFileReader::Open(std::move(readable_block), ReaderOptions(), &cfile));
+    table->AddRow(BuildInfoRow(CFileInfo, fields, tablet, rowset, block_kind,
+                               column_id, block, *cfile));
+
+  } else {
+    table->AddRow(BuildInfoRow(BlockInfo, fields, tablet, rowset, block_kind,
+                               column_id, block));
+  }
+  return Status::OK();
+}
+} // anonymous namespace
+
+Status List(const RunnerContext& /*context*/) {
+  // Parse the required fields into the enum form, and create an output data table.
+  vector<Field> fields;
+  vector<string> columns;
+  for (StringPiece name : strings::Split(FLAGS_columns, ",", strings::SkipEmpty())) {
+    Field field;
+    RETURN_NOT_OK(ParseField(name.ToString(), &field));
+    fields.push_back(field);
+    columns.emplace_back(ToString(field));
+  }
+  DataTable table(std::move(columns));
+
+  if (fields.empty()) {
+    return table.PrintTo(cout);
+  }
+
+  FsManagerOpts fs_opts;
+  fs_opts.read_only = true;
+  FsManager fs_manager(Env::Default(), std::move(fs_opts));
+  RETURN_NOT_OK(fs_manager.Open());
+
+  // Build the list of tablets to inspect.
+  vector<string> tablet_ids;
+  if (!FLAGS_tablet_id.empty()) {
+    string tablet_id = FLAGS_tablet_id;
+    ToLowerCase(&tablet_id);
+    tablet_ids.emplace_back(std::move(tablet_id));
+  } else {
+    RETURN_NOT_OK(fs_manager.ListTabletIds(&tablet_ids));
+  }
+
+  string table_id = FLAGS_table_id;
+  ToLowerCase(&table_id);
+
+  FieldGroup group = ToFieldGroup(*std::max_element(fields.begin(), fields.end()));
+  VLOG(1) << "group: " << string(ToString(group));
+
+  for (const string& tablet_id : tablet_ids) {
+    scoped_refptr<TabletMetadata> tablet_metadata;
+    RETURN_NOT_OK(TabletMetadata::Load(&fs_manager, tablet_id, &tablet_metadata));
+    const TabletMetadata& tablet = *tablet_metadata.get();
+
+    if (!table_id.empty() && table_id != tablet.table_id()) {
+      continue;
+    }
+
+    if (group == FieldGroup::kTablet) {
+      table.AddRow(BuildInfoRow(TabletInfo, fields, tablet));
+      continue;
+    }
+
+    for (const auto& rowset_metadata : tablet.rowsets()) {
+      const RowSetMetadata& rowset = *rowset_metadata.get();
+
+      if (FLAGS_rowset_id != -1 && FLAGS_rowset_id != rowset.id()) {
+        continue;
+      }
+
+      if (group == FieldGroup::kRowset) {
+        table.AddRow(BuildInfoRow(RowsetInfo, fields, tablet, rowset));
+        continue;
+      }
+
+      auto column_blocks = rowset.GetColumnBlocksById();
+      if (FLAGS_column_id >= 0) {
+        ColumnId column_id(FLAGS_column_id);
+        auto block = FindOrNull(column_blocks, column_id);
+        if (block) {
+          RETURN_NOT_OK(AddBlockInfoRow(&table, group, fields, &fs_manager, tablet,
rowset,
+                                        "column", column_id, *block));
+        }
+      } else {
+        for (const auto& col_block : column_blocks) {
+          RETURN_NOT_OK(AddBlockInfoRow(&table, group, fields, &fs_manager, tablet,
+                                        rowset, "column", col_block.first, col_block.second));
+        }
+        for (const auto& block : rowset.redo_delta_blocks()) {
+          RETURN_NOT_OK(AddBlockInfoRow(&table, group, fields, &fs_manager, tablet,
+                                        rowset, "redo", boost::none, block));
+        }
+        for (const auto& block : rowset.undo_delta_blocks()) {
+          RETURN_NOT_OK(AddBlockInfoRow(&table, group, fields, &fs_manager, tablet,
+                                        rowset, "undo", boost::none, block));
+        }
+        RETURN_NOT_OK(AddBlockInfoRow(&table, group, fields, &fs_manager, tablet,
+                                      rowset, "bloom", boost::none, rowset.bloom_block()));
+        RETURN_NOT_OK(AddBlockInfoRow(&table, group, fields, &fs_manager, tablet,
+                                      rowset, "adhoc-index", boost::none,
+                                      rowset.adhoc_index_block()));
+
+      }
+    }
+    // TODO(dan): should orphaned blocks be included, perhaps behind a flag?
+  }
+  return table.PrintTo(cout);
+}
 } // anonymous namespace
 
 static unique_ptr<Mode> BuildFsDumpMode() {
@@ -356,12 +823,40 @@ unique_ptr<Mode> BuildFsMode() {
       .AddOptionalParameter("fs_data_dirs")
       .Build();
 
+  unique_ptr<Action> list =
+      ActionBuilder("list", &List)
+      .Description("List metadata for on-disk tablets, rowsets, blocks, and cfiles")
+      .ExtraDescription("This tool is useful for discovering and gathering information about
"
+                        "on-disk data. Many field types can be added to the results with
the "
+                        "--columns flag, and results can be filtered to a specific table,
"
+                        "tablet, rowset, column, or block through flags.\n\n"
+                        "Note: adding any of the 'cfile' fields to --columns will cause "
+                        "the tool to read on-disk metadata for each CFile in the result set,
"
+                        "which could require large amounts of I/O when many results are returned.")
+      .AddOptionalParameter("fs_wal_dir")
+      .AddOptionalParameter("fs_data_dirs")
+      .AddOptionalParameter("table_id")
+      .AddOptionalParameter("tablet_id")
+      .AddOptionalParameter("rowset_id")
+      .AddOptionalParameter("column_id")
+      .AddOptionalParameter("block_id")
+      .AddOptionalParameter("columns", string("tablet-id, rowset-id, block-id, block-kind"),
+                            Substitute("Comma-separated list of fields to include in output.\n"
+                                       "Possible values: $0",
+                                       JoinMapped(kFieldVariants, [] (Field field) {
+                                                    return ToString(field);
+                                                  }, ", ")))
+      .AddOptionalParameter("format")
+      .AddOptionalParameter("h")
+      .Build();
+
   return ModeBuilder("fs")
       .Description("Operate on a local Kudu filesystem")
       .AddMode(BuildFsDumpMode())
       .AddAction(std::move(update))
       .AddAction(std::move(check))
       .AddAction(std::move(format))
+      .AddAction(std::move(list))
       .Build();
 }
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/94136dbb/src/kudu/tools/tool_action_tserver.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tools/tool_action_tserver.cc b/src/kudu/tools/tool_action_tserver.cc
index da7b7d0..1a1dc85 100644
--- a/src/kudu/tools/tool_action_tserver.cc
+++ b/src/kudu/tools/tool_action_tserver.cc
@@ -107,31 +107,31 @@ Status ListTServers(const RunnerContext& context) {
     vector<string> values;
     if (boost::iequals(column, "uuid")) {
       for (const auto& server : servers) {
-        values.push_back(server.instance_id().permanent_uuid());
+        values.emplace_back(server.instance_id().permanent_uuid());
       }
     } else if (boost::iequals(column, "seqno")) {
       for (const auto& server : servers) {
-        values.push_back(std::to_string(server.instance_id().instance_seqno()));
+        values.emplace_back(std::to_string(server.instance_id().instance_seqno()));
       }
     } else if (boost::iequals(column, "rpc-addresses") ||
                boost::iequals(column, "rpc_addresses")) {
       for (const auto& server : servers) {
-        values.push_back(JoinMapped(server.registration().rpc_addresses(),
-                                    hostport_to_string, ","));
+        values.emplace_back(JoinMapped(server.registration().rpc_addresses(),
+                                       hostport_to_string, ","));
       }
     } else if (boost::iequals(column, "http-addresses") ||
                boost::iequals(column, "http_addresses")) {
       for (const auto& server : servers) {
-        values.push_back(JoinMapped(server.registration().http_addresses(),
-                                    hostport_to_string, ","));
+        values.emplace_back(JoinMapped(server.registration().http_addresses(),
+                                       hostport_to_string, ","));
       }
     } else if (boost::iequals(column, "version")) {
       for (const auto& server : servers) {
-        values.push_back(server.registration().software_version());
+        values.emplace_back(server.registration().software_version());
       }
     } else if (boost::iequals(column, "heartbeat")) {
       for (const auto& server : servers) {
-        values.push_back(strings::Substitute("$0ms", server.millis_since_heartbeat()));
+        values.emplace_back(strings::Substitute("$0ms", server.millis_since_heartbeat()));
       }
     } else {
       return Status::InvalidArgument("unknown column (--columns)", column);


Mime
View raw message