public class SimpleKuduEventProducer implements KuduEventProducer { - private byte[] payload; - private KuduTable table; - private String payloadColumn; - - public SimpleKuduEventProducer(){ - } - - @Override - public void configure(Context context) { - payloadColumn = context.getString("payloadColumn","payload"); - } - - @Override - public void configure(ComponentConfiguration conf) { - } - - @Override - public void initialize(Event event, KuduTable table) { - this.payload = event.getBody(); - this.table = table; - } - - @Override - public List<Operation> getOperations() throws FlumeException { - try { - Insert insert = table.newInsert(); - PartialRow row = insert.getRow(); - row.addBinary(payloadColumn, payload); - - return Collections.singletonList((Operation) insert); - } catch (Exception e){ - throw new FlumeException("Failed to create Kudu Insert object!", e); - } - } - - @Override - public void close() { - } -} -

public interface KuduEventProducer extends Configurable, ConfigurableComponent { - /** +

public interface KuduEventProducer extends Configurable, ConfigurableComponent {
+  /**
    * Initialize the event producer.
    * @param event to be written to Kudu
    * @param table the KuduTable object used for creating Kudu Operation objects
-   */
-  void initialize(Event event, KuduTable table);
+   */
+  void initialize(Event event, KuduTable table);
 
-  /**
+  /**
    * Get the operations that should be written out to Kudu as a result of this
    * event. This list is written to Kudu using the Kudu client API.
    * @return List of {@link org.kududb.client.Operation} which
    * are written as such to Kudu
-   */
-  List<Operation> getOperations();
+   */
+  List<Operation> getOperations();
 
-  /*
+  /*
    * Clean up any state. This will be called when the sink is being stopped.
-   */
-  void close();
-}
-

This summer I got the opportunity to intern with the Apache Kudu team at Cloudera. +My project was to optimize the Kudu scan path by implementing a technique called +index skip scan (a.k.a. scan-to-seek, see section 4.1 in [1]). I wanted to share +my experience and the progress we’ve made so far on the approach.

+ + + +

Let’s begin with discussing the current query flow in Kudu. +Consider the following table:

+ +

CREATE TABLE metrics (
+    host STRING,
+    tstamp INT,
+    clusterid INT,
+    role STRING,
+    PRIMARY KEY (host, tstamp, clusterid)
+);

+ +

png +Sample rows of table metrics (sorted by key columns).

+ +

In this case, by default, Kudu internally builds a primary key index (implemented as a +B-tree) for the table metrics. +As shown in the table above, the index data is sorted by the composite of all key columns. +When the user query contains the first key column (host), Kudu uses the index (as the index data is +primarily sorted on the first key column).

+ +

Now, what if the user query does not contain the first key column and instead only contains the tstamp column? +In the above case, the tstamp column values are sorted with respect to host, +but are not globally sorted, and as such, it’s non-trivial to use the index to filter rows. +Instead, a full tablet scan is done by default. Other databases may optimize such scans by building secondary indexes +(though it might be redundant to build one on one of the primary keys). However, this isn’t an option for Kudu, +given its lack of secondary index support.

+ +

The question is, can Kudu do better than a full tablet scan here?

+ +

The answer is yes! Let’s observe the column preceding the tstamp column. We will refer to it as the +“prefix column” and its specific value as the “prefix key”. In this example, host is the prefix column. +Note that the prefix keys are sorted in the index and that all rows of a given prefix key are also sorted by the +remaining key columns. Therefore, we can use the index to skip to the rows that have distinct prefix keys, +and also satisfy the predicate on the tstamp column. +For example, consider the query:

+ +

SELECT clusterid FROM metrics WHERE tstamp = 100;

+ +

png +Skip scan flow illustration. The rows in green are scanned and the rest are skipped.

+ +

The tablet server can use the index to skip to the first row with a distinct prefix key (host = helium) that +matches the predicate (tstamp = 100) and then scan through the rows until the predicate no longer matches. At that +point we would know that no more rows with host = helium will satisfy the predicate, and we can skip to the next +prefix key. This holds true for all distinct keys of host. Hence, this method is popularly known as +skip scan optimization[2, 3].

+ +

Performance

+ +

This optimization can speed up queries significantly, depending on the cardinality (number of distinct values) of the +prefix column. The lower the prefix column cardinality, the better the skip scan performance. In fact, when the +prefix column cardinality is high, skip scan is not a viable approach. The performance graph (obtained using the example +schema and query pattern mentioned earlier) is shown below.

+ +

Based on our experiments, on up to 10 million rows per tablet (as shown below), we found that the skip scan performance +begins to get worse with respect to the full tablet scan performance when the prefix column cardinality +exceeds sqrt(number_of_rows_in_tablet). +Therefore, in order to use skip scan performance benefits when possible and maintain a consistent performance in cases +of large prefix column cardinality, we have tentatively chosen to dynamically disable skip scan when the number of skips for +distinct prefix keys exceeds sqrt(number_of_rows_in_tablet). +It will be an interesting project to further explore sophisticated heuristics to decide when +to dynamically disable skip scan.

+ +

png

+ +

Conclusion

+ +

Skip scan optimization in Kudu can lead to huge performance benefits that scale with the size of +data in Kudu tablets. This is a work-in-progress patch. +The implementation in the patch works only for equality predicates on the non-first primary key +columns. An important point to note is that although, in the above specific example, the number of prefix +columns is one (host), this approach is generalized to work with any number of prefix columns.

+ +

This work also lays the groundwork to leverage the skip scan approach and optimize query processing time in the +following use cases:

+ +

Range predicates
In-list predicates

+ +

This was my first time working on an open source project. I thoroughly enjoyed working on this challenging problem, +right from understanding the scan path in Kudu to working on a full-fledged implementation of +the skip scan optimization. I am very grateful to the Kudu team for guiding and supporting me throughout the +internship period.

+ +

References

+ +

[1]: Gupta, Ashish, et al. “Mesa: +Geo-replicated, near real-time, scalable data warehousing.” Proceedings of the VLDB Endowment 7.12 (2014): 1259-1270.

+ +

[2]: Index Skip Scanning - Oracle Database

+ +

[3]: Skip Scan - SQLite

+ + +

Simplified Data Pipelines with Kudu

Posted 11 Sep 2018 by Mac Noland

@@ -219,31 +242,6 @@ optimizations, incremental improvements, and bug fixes.

- -

Apache Kudu 1.6.0 released

Posted 08 Dec 2017 by Mike Percy

- -

The Apache Kudu team is happy to announce the release of Kudu 1.6.0!

- -

Apache Kudu 1.6.0 is a minor release that offers new features, performance -optimizations, incremental improvements, and bug fixes.

- -

Release highlights:

- - - -

- Read full post... -

- - -

@@ -262,6 +260,8 @@ optimizations, incremental improvements, and bug fixes.

Recent posts

Index Skip Scan Optimization in Kudu
Simplified Data Pipelines with Kudu
Getting Started with Kudu - an O'Reilly Title

Apache Kudu Weekly Update November 1st, 2016
Apache Kudu Weekly Update October 20th, 2016

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/10/index.html ---------------------------------------------------------------------- diff --git a/blog/page/10/index.html b/blog/page/10/index.html index eff1bfa..6a60e13 100644 --- a/blog/page/10/index.html +++ b/blog/page/10/index.html @@ -117,6 +117,29 @@

Predicate Improvements in Kudu 0.8

Posted 19 Apr 2016 by Dan Burkert

+ +

The recently released Kudu version 0.8 ships with a host of new improvements to +scan predicates. Performance and usability have been improved, especially for +tables taking advantage of advanced partitioning +options.

+ + + +

+ Read full post... +

+ + + + +

Apache Kudu (incubating) Weekly Update April 18, 2016

Posted 18 Apr 2016 by Todd Lipcon

@@ -217,27 +240,6 @@ covers ongoing development and news in the Apache Kudu (incubating) project.

- -

Apache Kudu (incubating) Weekly Update April 4, 2016

Posted 04 Apr 2016 by Todd Lipcon

- -

Welcome to the third edition of the Kudu Weekly Update. This weekly blog post -covers ongoing development and news in the Apache Kudu (incubating) project.

- - - -

- Read full post... -

- - -

@@ -258,6 +260,8 @@ covers ongoing development and news in the Apache Kudu (incubating) project.

Recent posts

Index Skip Scan Optimization in Kudu
Simplified Data Pipelines with Kudu
Getting Started with Kudu - an O'Reilly Title

Apache Kudu Weekly Update November 1st, 2016
Apache Kudu Weekly Update October 20th, 2016

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/11/index.html ---------------------------------------------------------------------- diff --git a/blog/page/11/index.html b/blog/page/11/index.html index 259609e..5b12b5e 100644 --- a/blog/page/11/index.html +++ b/blog/page/11/index.html @@ -117,6 +117,27 @@

Apache Kudu (incubating) Weekly Update April 4, 2016

Posted 04 Apr 2016 by Todd Lipcon

+ +

Welcome to the third edition of the Kudu Weekly Update. This weekly blog post +covers ongoing development and news in the Apache Kudu (incubating) project.

+ + + +

+ Read full post... +

+ + + + +

Apache Kudu (incubating) Weekly Update March 28, 2016

Posted 28 Mar 2016 by Todd Lipcon

@@ -229,6 +250,8 @@ part of the ASF Incubator, version 0.7.0!

Recent posts

Index Skip Scan Optimization in Kudu
Simplified Data Pipelines with Kudu
Getting Started with Kudu - an O'Reilly Title

Apache Kudu Weekly Update November 1st, 2016
Apache Kudu Weekly Update October 20th, 2016

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/2/index.html ---------------------------------------------------------------------- diff --git a/blog/page/2/index.html b/blog/page/2/index.html index f0c303b..ad6a4af 100644 --- a/blog/page/2/index.html +++ b/blog/page/2/index.html @@ -117,6 +117,31 @@

Apache Kudu 1.6.0 released

Posted 08 Dec 2017 by Mike Percy

+ +

The Apache Kudu team is happy to announce the release of Kudu 1.6.0!

+ +

Apache Kudu 1.6.0 is a minor release that offers new features, performance +optimizations, incremental improvements, and bug fixes.

+ +

Release highlights:

+ + + +

+ Read full post... +

+ + + + +

Slides: A brave new world in mutable big data: Relational storage

Posted 23 Oct 2017 by Todd Lipcon

@@ -217,40 +242,6 @@ improvements, optimizations, and bug fixes.

- -

Apache Kudu 1.3.1 released

Posted 19 Apr 2017 by Todd Lipcon

- -

The Apache Kudu team is happy to announce the release of Kudu 1.3.1!

- -

Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered -in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be -incorrectly deleted after certain sequences of node failures. Several other -bugs are also fixed. See the release notes for details.

- -

Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.

- -

Download the Kudu 1.3.1 source release
Convenience binary artifacts for the Java client and various Java -integrations (eg Spark, Flume) are also now available via the ASF Maven -repository.

- - - -

- Read full post... -

- - -

@@ -271,6 +262,8 @@ repository.

Recent posts

Index Skip Scan Optimization in Kudu
Simplified Data Pipelines with Kudu
Getting Started with Kudu - an O'Reilly Title
Apache Kudu Weekly Update November 1st, 2016
Apache Kudu Weekly Update October 20th, 2016

http://git-wip-us.apache.org/repos/asf/kudu-site/blob/12782cec/blog/page/3/index.html ---------------------------------------------------------------------- diff --git a/blog/page/3/index.html b/blog/page/3/index.html index c102756..bf91a51 100644 --- a/blog/page/3/index.html +++ b/blog/page/3/index.html @@ -117,6 +117,40 @@

Apache Kudu 1.3.1 released

Posted 19 Apr 2017 by Todd Lipcon

+ +

The Apache Kudu team is happy to announce the release of Kudu 1.3.1!

+ +

Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered +in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be +incorrectly deleted after certain sequences of node failures. Several other +bugs are also fixed. See the release notes for details.

+ +

Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.

+ +

Download the Kudu 1.3.1 source release
Convenience binary artifacts for the Java client and various Java +integrations (eg Spark, Flume) are also now available via the ASF Maven +repository.

+ + + +

+ Read full post... +

+ + + + +

Apache Kudu 1.3.0 released

Posted 20 Mar 2017 by Todd Lipcon

@@ -202,27 +236,6 @@ covers ongoing development and news in the Apache Kudu project.

- -

Apache Kudu Weekly Update October 20th, 2016

Posted 20 Oct 2016 by Todd Lipcon

- -

Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post -covers ongoing development and news in the Apache Kudu project.

- - - -

- Read full post... -

- - -