kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abu...@apache.org
Subject [11/21] kudu git commit: [docs] Update docs with contributing to blog
Date Tue, 11 Dec 2018 21:11:34 GMT
http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/contributing.html
----------------------------------------------------------------------
diff --git a/docs/contributing.html b/docs/contributing.html
new file mode 100644
index 0000000..797e5f0
--- /dev/null
+++ b/docs/contributing.html
@@ -0,0 +1,980 @@
+---
+title: Contributing to Apache Kudu
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-12-07 15:50:19 CET'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Contributing to Apache Kudu</h1>
+      <div class="sect1">
+<h2 id="_contributing_patches_using_gerrit"><a class="link" href="#_contributing_patches_using_gerrit">Contributing Patches Using Gerrit</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The Kudu team uses Gerrit for code review, rather than Github pull requests. Typically,
+you pull from Github but push to Gerrit, and Gerrit is used to review code and merge
+it into Github.</p>
+</div>
+<div class="paragraph">
+<p>See the <a href="https://www.mediawiki.org/wiki/Gerrit/Tutorial">Gerrit Tutorial</a>
+for an overview of using Gerrit for code review.</p>
+</div>
+<div class="sect2">
+<h3 id="_initial_setup_for_gerrit"><a class="link" href="#_initial_setup_for_gerrit">Initial Setup for Gerrit</a></h3>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Sign in to <a href="http://gerrit.cloudera.org:8080">Gerrit</a> using your Github username.</p>
+</li>
+<li>
+<p>Go to <a href="http://gerrit.cloudera.org:8080/#/settings/">Settings</a>. Update your name
+and email address on the <strong>Contact Information</strong> page, and upload a SSH public key.
+If you do not update your name, it will show up as "Anonymous Coward" in Gerrit reviews.</p>
+</li>
+<li>
+<p>If you have not done so, clone the main Kudu repository. By default, the main remote
+is called <code>origin</code>. When you fetch or pull, you will do so from <code>origin</code>.</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">git clone https://github.com/apache/kudu</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Change to the new <code>kudu</code> directory.</p>
+</li>
+<li>
+<p>Add a <code>gerrit</code> remote. In the following command, substitute &lt;username&gt; with your
+Github username.</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">git remote add gerrit ssh://&lt;username&gt;@gerrit.cloudera.org:29418/kudu</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Run the following command to install the
+Gerrit <code>commit-msg</code> hook. Use the following command, replacing <code>&lt;username&gt;</code> with your
+Github username.</p>
+<div class="listingblock">
+<div class="content">
+<pre>gitdir=$(git rev-parse --git-dir); scp -p -P 29418 &lt;username&gt;@gerrit.cloudera.org:hooks/commit-msg ${gitdir}/hooks/</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Be sure you have set the Kudu repository to use <code>pull --rebase</code> by default. You
+can use the following two commands, assuming you have only ever checked out <code>master</code>
+so far:</p>
+<div class="listingblock">
+<div class="content">
+<pre>git config branch.autosetuprebase always
+git config branch.master.rebase true</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If for some reason you had already checked out branches other than <code>master</code>, substitute
+<code>master</code> for the other branch names in the second command above.</p>
+</div>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_submitting_patches"><a class="link" href="#_submitting_patches">Submitting Patches</a></h3>
+<div class="paragraph">
+<p>To submit a patch, first commit your change (using a descriptive multi-line
+commit message if possible), then push the request to the <code>gerrit</code> remote. For instance, to push a change
+to the <code>master</code> branch:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>git push gerrit HEAD:refs/for/master --no-thin</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>or to push a change to the <code>gh-pages</code> branch (to update the website):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>git push gerrit HEAD:refs/for/gh-pages --no-thin</pre>
+</div>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+While preparing a patch for review, it&#8217;s a good idea to follow
+<a href="https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines">generic git commit guidelines and good practices</a>.
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+The <code>--no-thin</code> argument is a workaround to prevent an error in Gerrit. See
+<a href="https://code.google.com/p/gerrit/issues/detail?id=1582" class="bare">https://code.google.com/p/gerrit/issues/detail?id=1582</a>.
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+Consider creating Git aliases for the above commands. Gerrit also includes
+a command-line tool called
+<a href="https://www.mediawiki.org/wiki/Gerrit/Tutorial#Installing_git-review">git-review</a>,
+which you may find helpful.
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+You can add reviewers automatically for a patch by adding their GitHub
+username or associated email address to the remote branch name following with
+the "r" flag:
+</td>
+</tr>
+</table>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>git push gerrit HEAD:refs/for/master%r=githubuser,r=example@apache.org</pre>
+</div>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+To find possible reviewer candidates for your commit, use git blame or git
+log to find out who are involved with the area you&#8217;re touching. It&#8217;s also a
+good idea to add as reviewer whoever is involved with the JIRA you&#8217;re working
+on.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>Gerrit will add a change ID to your commit message and will create a Gerrit review,
+whose URL will be emitted as part of the push reply. If desired, you can send a message
+to the <code>kudu-dev</code> mailing list, explaining your patch and requesting review.</p>
+</div>
+<div class="paragraph">
+<p>After getting feedback, you can update or amend your commit, (for instance, using
+a command like <code>git commit --amend</code>) while leaving the Change
+ID intact. Push your change to Gerrit again, and this will create a new patch set
+in Gerrit and notify all reviewers about the change.</p>
+</div>
+<div class="paragraph">
+<p>When your code has been reviewed and is ready to be merged into the Kudu code base,
+a Kudu committer will merge it using Gerrit. You can discard your local branch.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_abandoning_a_review"><a class="link" href="#_abandoning_a_review">Abandoning a Review</a></h3>
+<div class="paragraph">
+<p>If your patch is not accepted or you decide to pull it from consideration, you can
+use the Gerrit UI to <strong>Abandon</strong> the patch. It will still show in Gerrit&#8217;s history,
+but will not be listed as a pending review.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_reviewing_patches_in_gerrit"><a class="link" href="#_reviewing_patches_in_gerrit">Reviewing Patches In Gerrit</a></h3>
+<div class="paragraph">
+<p>You can view a unified or side-by-side diff of changes in Gerrit using the web UI.
+To leave a comment, click the relevant line number or highlight the relevant part
+of the line, and type 'c' to bring up a comment box. To submit your comments and/or
+your review status, go up to the top level of the review and click <strong>Reply</strong>. You can
+add additional top-level comments here, and submit them.</p>
+</div>
+<div class="paragraph">
+<p>To check out code from a Gerrit review, click <strong>Download</strong> and paste the relevant Git
+commands into your Git client. You can then update the commit and push to Gerrit to
+submit a patch to the review, even if you were not the original reviewer.</p>
+</div>
+<div class="paragraph">
+<p>Gerrit allows you to vote on a review. A vote of <code>+2</code> from at least one committer
+(besides the submitter) is required before the patch can be merged.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_code_style"><a class="link" href="#_code_style">Code Style</a></h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_c_code_style"><a class="link" href="#_c_code_style">C++ Code Style</a></h3>
+<div class="paragraph">
+<p>Get familiar with these guidelines so that your contributions can be reviewed and
+integrated quickly and easily.</p>
+</div>
+<div class="paragraph">
+<p>In general, Kudu follows the
+<a href="https://google.github.io/styleguide/cppguide.html">Google C++ Style Guide</a>,
+with the following exceptions:</p>
+</div>
+<div class="sect3">
+<h4 id="_notes_on_c_11"><a class="link" href="#_notes_on_c_11">Notes on C++ 11</a></h4>
+<div class="paragraph">
+<p>Kudu uses C++ 11. Check out this handy guide to C++ 11 move semantics and rvalue
+references: <a href="https://www.chromium.org/rvalue-references" class="bare">https://www.chromium.org/rvalue-references</a></p>
+</div>
+<div class="paragraph">
+<p>We aim to follow most of the same guidelines, such as, where possible, migrating
+away from <code>foo.Pass()</code> in favor of <code>std::move(foo)</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_limitations_on_code_boost_code_use"><a class="link" href="#_limitations_on_code_boost_code_use">Limitations on <code>boost</code> Use</a></h4>
+<div class="paragraph">
+<p><code>boost</code> classes from header-only libraries can be used in cases where a suitable
+replacement does not exist in the Kudu code base. However:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Do not introduce dependencies on <code>boost</code> classes where equivalent functionality
+exists in the standard C++ library or in <code>src/kudu/gutil/</code>. For example, prefer
+<code>strings::Split()</code> from <code>gutil</code> rather than <code>boost::split</code>.</p>
+</li>
+<li>
+<p>Prefer using functionality from  <code>boost</code> rather than re-implementing the same
+functionality, <em>unless</em> using the <code>boost</code> functionality requires excessive use of
+C++ features which are disallowed by our style guidelines. For example,
+<code>boost::spirit</code> is heavily based on template metaprogramming and should not be used.</p>
+</li>
+<li>
+<p>Do not use <code>boost</code> in any public headers for the Kudu C++ client, because
+<code>boost</code> commonly breaks backward compatibility, and passing data between two
+<code>boost</code> versions (one by the user, one by Kudu) causes serious issues.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>When in doubt about introducing a new dependency on any <code>boost</code> functionality,
+it is best to email <code>dev@kudu.apache.org</code> to start a discussion.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_line_length"><a class="link" href="#_line_length">Line length</a></h4>
+<div class="paragraph">
+<p>The Kudu team allows line lengths of 100 characters per line, rather than Google&#8217;s standard of 80. Try to
+keep under 80 where possible, but you can spill over to 100 or so if necessary.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_pointers"><a class="link" href="#_pointers">Pointers</a></h4>
+<div class="paragraph">
+<div class="title">Smart Pointers and Singly-Owned Pointers</div>
+<p>Generally, most objects should have clear "single-owner" semantics.
+Most of the time, singly-owned objects can be wrapped in a <code>unique_ptr&lt;&gt;</code>
+which ensures deletion on scope exit and prevents accidental copying.</p>
+</div>
+<div class="paragraph">
+<p>If an object is singly owned, but referenced from multiple places, such as when
+the pointed-to object is known to be valid at least as long as the pointer itself,
+associate a comment with the constructor which takes and stores the raw pointer,
+as in the following example.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c++" data-lang="c++">  // 'blah' must remain valid for the lifetime of this class
+  MyClass(const Blah* blah) :
+    blah_(blah) {
+  }</code></pre>
+</div>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Older parts of the Kudu code base use <code>gscoped_ptr</code> instead of
+<code>unique_ptr</code>. These are hold-overs from before Kudu adopted C++11.
+New code should not use <code>gscoped_ptr</code> except when necessary to interface
+with existing code. Alternatively, consider updating usages as you come
+across them.
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+Using <code>std::auto_ptr</code> is strictly disallowed because of its difficult and
+bug-prone semantics. Besides, <code>std::auto_ptr</code> is declared deprecated
+since C++11.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<div class="title">Smart Pointers for Multiply-Owned Pointers:</div>
+<p>Although single ownership is ideal, sometimes it is not possible, particularly
+when multiple threads are in play and the lifetimes of the pointers are not
+clearly defined. In these cases, you can use either <code>std::shared_ptr</code> or
+Kudu&#8217;s own <code>scoped_refptr</code> from <em>gutil/ref_counted.hpp</em>. Each of these mechanisms
+relies on reference counting to automatically delete the referent once no more
+pointers remain. The key difference between these two types of pointers is that
+<code>scoped_refptr</code> requires that the object extend a <code>RefCounted</code> base class, and
+stores its reference count inside the object storage itself, while <code>shared_ptr</code>
+maintains a separate reference count on the heap.</p>
+</div>
+<div class="paragraph">
+<p>The pros and cons are:</p>
+</div>
+<div class="ulist none">
+<div class="title"><code>shared_ptr</code></div>
+<ul class="none">
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> can be used with any type of object, without the
+object deriving from a special base class</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> part of the standard library and familiar to most
+C++ developers</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> supports the <code>weak_ptr</code> use cases:</p>
+<div class="ulist">
+<ul>
+<li>
+<p>a temporary ownership when an object needs to be accessed only if it exists</p>
+</li>
+<li>
+<p>break circular references of <code>shared_ptr</code>, if any exists due to aggregation</p>
+</li>
+</ul>
+</div>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> you can convert from the
+<code>shared_ptr</code> into the <code>weak_ptr</code> and back</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle"></i></span> if creating an instance with
+<code>std::make_shared&lt;&gt;()</code> only one allocation is made (since C++11;
+a non-binding requirement in the Standard, though)</p>
+</li>
+<li>
+<p><span class="icon red"><i class="fa fa-minus-circle"></i></span> if creating a new object with
+<code>shared_ptr&lt;T&gt; p(new T)</code> requires two allocations (one to create the ref count,
+and one to create the object)</p>
+</li>
+<li>
+<p><span class="icon red"><i class="fa fa-minus-circle"></i></span> the ref count may not be near the object on the heap,
+so extra cache misses may be incurred on access</p>
+</li>
+<li>
+<p><span class="icon red"><i class="fa fa-minus-circle"></i></span> the <code>shared_ptr</code> instance itself requires 16 bytes
+(pointer to the ref count and pointer to the object)</p>
+</li>
+</ul>
+</div>
+<div class="ulist none">
+<div class="title"><code>scoped_refptr</code></div>
+<ul class="none">
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> only requires a single allocation, and ref count
+is on the same cache line as the object</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> the pointer only requires 8 bytes (since
+the ref count is within the object)</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> you can manually increase or decrease
+reference counts when more control is required</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> you can convert from a raw pointer back
+to a <code>scoped_refptr</code> safely without worrying about double freeing</p>
+</li>
+<li>
+<p><span class="icon green"><i class="fa fa-plus-circle fa-pro"></i></span> since we control the implementation, we
+can implement features, such as debug builds that capture the stack trace of every
+referent to help debug leaks.</p>
+</li>
+<li>
+<p><span class="icon red"><i class="fa fa-minus-circle fa-con"></i></span> the referred-to object must inherit
+from <code>RefCounted</code></p>
+</li>
+<li>
+<p><span class="icon red"><i class="fa fa-minus-circle fa-con"></i></span> does not support the <code>weak_ptr</code> use cases</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Since <code>scoped_refptr</code> is generally faster and smaller, try to use it
+rather than <code>shared_ptr</code> in new code. Existing code uses <code>shared_ptr</code>
+in many places. When interfacing with that code, you can continue to use <code>shared_ptr</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_function_binding_and_callbacks"><a class="link" href="#_function_binding_and_callbacks">Function Binding and Callbacks</a></h4>
+<div class="paragraph">
+<p>Existing code uses <code>boost::bind</code> and <code>boost::function</code> for function binding and
+callbacks. For new code, use the <code>Callback</code> and <code>Bind</code> classes in <code>gutil</code> instead.
+While less full-featured (<code>Bind</code> doesn&#8217;t support argument
+place holders, wrapped function pointers, or function objects), they provide
+more options by the way of argument lifecycle management. For example, a
+bound argument whose class extends <code>RefCounted</code> will be incremented during <code>Bind</code>
+and decremented when the <code>Callback</code> goes out of scope.</p>
+</div>
+<div class="paragraph">
+<p>See the large file comment in <em>gutil/callback.h</em> for more details, and
+<em>util/callback_bind-test.cc</em> for examples.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_gflags"><a class="link" href="#_gflags">GFlags</a></h4>
+<div class="paragraph">
+<p>Kudu uses gflags for both command-line and file-based configuration. Use these guidelines
+to add a new gflag. All new gflags must conform to these
+guidelines. Existing non-conformant ones will be made conformant in time.</p>
+</div>
+<div class="paragraph">
+<div class="title">Name</div>
+<p>The gflag&#8217;s name conveys a lot of information, so choose a good name. The name
+will propagate into other systems, such as the
+<a href="configuration_reference.html">Configuration Reference</a>.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The different parts of a multi-word name should be separated by underscores.
+For example, <code>fs_data_dirs</code>.</p>
+</li>
+<li>
+<p>The name should be prefixed with the context that it affects. For example,
+<code>webserver_num_worker_threads</code> and <code>cfile_default_block_size</code>. Context can be
+difficult to define, so bear in mind that this prefix will be
+used to group similar gflags together. If the gflag affects the entire
+process, it should not be prefixed.</p>
+</li>
+<li>
+<p>If the gflag is for a quantity, the name should be suffixed with the units.
+For example, <code>tablet_copy_idle_timeout_ms</code>.</p>
+</li>
+<li>
+<p>Where possible, use short names. This will save time for those entering
+command line options by hand.</p>
+</li>
+<li>
+<p>The name is part of Kudu&#8217;s compatibility contract, and should not change
+without very good reason.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<div class="title">Default value</div>
+<p>Choosing a default value is generally simple, but like the name, it propagates
+into other systems.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The default value is part of Kudu&#8217;s compatibility contract, and should not
+change without very good reason.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<div class="title">Description</div>
+<p>The gflag&#8217;s description should supplement the name and provide additional
+context and information. Like the name, the description propagates into other
+systems.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The description may include multiple sentences. Each should begin with a
+capital letter, end with a period, and begin one space after the previous.</p>
+</li>
+<li>
+<p>The description should NOT include the gflag&#8217;s type or default value; they are
+provided out-of-band.</p>
+</li>
+<li>
+<p>The description should be in the third person. Do not use words like <code>you</code>.</p>
+</li>
+<li>
+<p>A gflag description can be changed freely; it is not expected to remain the
+same across Kudu releases.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<div class="title">Tags</div>
+<p>Kudu&#8217;s gflag tagging mechanism adds machine-readable context to each gflag, for
+use in consuming systems such as documentation or management tools. See the large block
+comment in <em>flag_tags.h</em> for guidelines.</p>
+</div>
+<div class="ulist">
+<div class="title">Miscellaneous</div>
+<ul>
+<li>
+<p>Avoid creating multiple gflags for the same logical parameter. For
+example, many Kudu binaries need to configure a WAL directory. Rather than
+creating <code>foo_wal_dir</code> and <code>bar_wal_dir</code> gflags, better to have a single
+<code>kudu_wal_dir</code> gflag for use universally.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_java_code_style"><a class="link" href="#_java_code_style">Java Code Style</a></h3>
+<div class="sect3">
+<h4 id="_preconditions_vs_assert_in_the_kudu_java_client"><a class="link" href="#_preconditions_vs_assert_in_the_kudu_java_client">Preconditions vs assert in the Kudu Java client</a></h4>
+<div class="paragraph">
+<p>Use <code>assert</code> for verification of the static (i.e. non-runtime) internal
+invariants. Internal means the pre- and post-conditions which are
+completely under control of the code of a class or a function itself and cannot
+be influenced by input parameters and other runtime/dynamic conditions.</p>
+</div>
+<div class="paragraph">
+<p>Use <code>Preconditions</code> for verification of the input parameters and the other
+conditions which are outside of the control of the local code, or conditions
+which are dependent on the state of other objects/components in runtime.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-java" data-lang="java">Object pop() {
+  // Use Preconditions here because the external user of the class should not
+  // call pop() on an empty stack, but the stack itself is internally consistent
+  Preconditions.checkState(curSize &gt; 0, "queue must not be empty");
+  Object toReturn = data[--curSize];
+  // Use an assert here because if we ended up with a negative size counter,
+  // that's an indication of a broken implementation of the stack; i.e. it's
+  // an invariant, not a state check.
+  assert curSize &gt;= 0;
+  return toReturn;
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>However, keep in mind that <code>assert</code> checks are enabled only when the JVM is
+run with <code>-ea</code> option. So, if some dynamic condition is crucial for the
+overall consistency (e.g. a data loss can occur if some dynamic condition is not
+satisfied and the code continues its execution), consider throwing an
+<code>AssertionError</code>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-java" data-lang="java">if (!isCriticalConditionSatisfied) {
+  throw new AssertionError("cannot continue: data loss is possible otherwise");
+}</code></pre>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_references"><a class="link" href="#_references">References</a></h5>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html">Programming With Assertions</a></p>
+</li>
+<li>
+<p><a href="https://github.com/google/guava/wiki/PreconditionsExplained">Guava Preconditions Explained</a></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_code_cmake_code_style_guide"><a class="link" href="#_code_cmake_code_style_guide"><code>CMake</code> Style Guide</a></h3>
+<div class="paragraph">
+<p><code>CMake</code> allows commands in lower, upper, or mixed case. To keep
+the CMake files consistent, please use the following guidelines:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>built-in commands</strong> in lowercase</p>
+</li>
+</ul>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>add_subdirectory(some/path)</pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>built-in arguments</strong> in uppercase</p>
+</li>
+</ul>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>message(STATUS "message goes here")</pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>custom commands or macros</strong> in uppercase</p>
+</li>
+</ul>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>ADD_KUDU_TEST(some-test)</pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_testing"><a class="link" href="#_testing">Testing</a></h2>
+<div class="sectionbody">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">All new code should have tests.</dt>
+<dd>
+<p>Add new tests either in existing files, or create new test files as necessary.</p>
+</dd>
+<dt class="hdlist1">All bug fixes should have tests.</dt>
+<dd>
+<p>It&#8217;s OK to fix a bug without adding a
+new test if it&#8217;s triggered by an existing test case. For example, if a
+race shows up when running a multi-threaded system test after 20
+minutes or so, it&#8217;s worth trying to make a more targeted test case to
+trigger the bug. But if that&#8217;s hard to do, the existing system test
+should be enough.</p>
+</dd>
+<dt class="hdlist1">Tests should run quickly (&lt; 1s).</dt>
+<dd>
+<p>If you want to write a time-intensive
+test, make the runtime dependent on <code>KuduTest#AllowSlowTests</code>, which is
+enabled via the <code>KUDU_ALLOW_SLOW_TESTS</code> environment variable and is
+used by Jenkins test execution.</p>
+</dd>
+<dt class="hdlist1">Tests which run a number of iterations of some task should use a <code>gflags</code> command-line argument for the number of iterations.</dt>
+<dd>
+<p>This is handy for writing quick stress tests or performance tests.</p>
+</dd>
+<dt class="hdlist1">Commits which may affect performance should include before/after <code>perf-stat(1)</code> output.</dt>
+<dd>
+<p>This will show performance improvement or non-regression.
+Performance-sensitive code should include some test case which can be used as a
+targeted benchmark.</p>
+</dd>
+</dl>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_documentation"><a class="link" href="#_documentation">Documentation</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>See the
+<a href="https://github.com/apache/kudu/blob/master/docs/design-docs/doc-style-guide.adoc">Documentation Style Guide</a>
+for guidelines about contributing to the official Kudu documentation.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_blog_posts"><a class="link" href="#_blog_posts">Blog posts</a></h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_writing_a_post_on_the_kudu_blog"><a class="link" href="#_writing_a_post_on_the_kudu_blog">Writing a post on the Kudu blog</a></h3>
+<div class="paragraph">
+<p>If you are using or integrating with Kudu, consider doing a write-up about your
+use case and your integration with Kudu and submitting it to be posted as an
+article on the Kudu blog. People in the community love to read about how Kudu
+is being used around the world.</p>
+</div>
+<div class="paragraph">
+<p>Consider checking with the project developers on the Kudu Slack instance or on
+<a href="mailto:dev@kudu.apache.org">dev@kudu.apache.org</a> if you have any questions about
+the content or the topic of a potential Kudu blog post.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_submitting_a_blog_post_in_google_doc_format"><a class="link" href="#_submitting_a_blog_post_in_google_doc_format">Submitting a blog post in Google Doc format</a></h3>
+<div class="paragraph">
+<p>If you don&#8217;t have the time to learn Markdown or to submit a Gerrit change
+request, but you would still like to submit a post for the Kudu blog, feel free
+to write your post in Google Docs format and share the draft with us publicly
+on <a href="mailto:dev@kudu.apache.org">dev@kudu.apache.org</a>&#8201;&#8212;&#8201;we&#8217;ll be happy to review
+it and post it to the blog for you once it&#8217;s ready to go.</p>
+</div>
+<div class="paragraph">
+<p>If you would like to submit the post directly to Gerrit for review in Markdown
+format (the developers will appreciate it if you do), please read below.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_how_to_format_a_kudu_blog_post"><a class="link" href="#_how_to_format_a_kudu_blog_post">How to format a Kudu blog post</a></h3>
+<div class="paragraph">
+<p>Blog posts live in the <code>gh-pages</code> branch under the <code>_posts</code> directory in
+Markdown format. They&#8217;re automatically rendered by Jekyll so for those familiar
+with Markdown or Jekyll, submitting a blog post should be fairly
+straightforward.</p>
+</div>
+<div class="paragraph">
+<p>Each post is a separate file named in the following format:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>YYYY-MM-DD-title-of-the-post.md</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The <code>YYYY-MM-DD</code> part is the date which will be included in the link as
+<code>/YYYY-MM-DD</code>, then <code>title-of-the-post</code> is used verbatim. The words should be
+separated by dashes and should contain only lowercase letters of the English
+alphabet and numbers. Finally, the <code>.md</code> extension will be replaced with
+<code>.html</code>.</p>
+</div>
+<div class="paragraph">
+<p>The header contains the layout information (which is always "post"), the
+title and the author&#8217;s name.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>---
+layout: post
+title: Example Post
+author: John Doe
+---</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The actual text of the blog post goes below this header, beginning with the
+"lead" which is a short excerpt that shows up in the index. This is separated
+by the <code>&lt;!--more--&gt;</code> string from the rest of the post.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_how_to_check_the_rendering_of_a_blog_post"><a class="link" href="#_how_to_check_the_rendering_of_a_blog_post">How to check the rendering of a blog post</a></h3>
+<div class="paragraph">
+<p>Once you&#8217;ve finished the post, there is a command you can run to make sure it
+looks good called <code>site_tool</code> in the root of the <code>gh-pages</code> branch that can
+start up Jekyll and serve the rendered site locally. To run this, you need Ruby
+and Python to be installed on your machine, and you can start it with the below
+command.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./site_tool jekyll serve</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When starting, it will print the URL where you can reach the site, but it should
+be <a href="http://localhost:4000" class="bare">http://localhost:4000</a>, or to reach the blog directly,
+<a href="http://localhost:4000/blog" class="bare">http://localhost:4000/blog</a></p>
+</div>
+<div class="paragraph">
+<p>You should be able to see the title and lead of your post along with your name
+at the top of this page, and after clicking on the title or the "Read full
+post&#8230;&#8203;", the whole post.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_how_to_submit_a_blog_post"><a class="link" href="#_how_to_submit_a_blog_post">How to submit a blog post</a></h3>
+<div class="paragraph">
+<p>To submit the post, you&#8217;ll need to commit your change and push it to
+<a href="#_contributing_patches_using_gerrit">Gerrit</a> for review. If the post is deemed
+useful for the community and all comments are addressed, a committer can merge
+and publish your post.</p>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>If you have a GitHub account, you can fork Kudu from
+<a href="https://github.com/apache/kudu" class="bare">https://github.com/apache/kudu</a> and push the change to your fork too. GitHub will
+automatically render it on <a href="https://&lt;yourname&gt;.github.io/blog" class="bare">https://&lt;yourname&gt;.github.io/blog</a> and you can link it
+directly on Gerrit.</p>
+</div>
+<div class="paragraph">
+<p>This way the reviewers can see that the post renders well without having to
+download it, which can speed up the review process.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="scaling_guide.html">Kudu Scaling Guide</a> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+<span class="active-toc">Contributing to Kudu</span>
+            <ul class="sectlevel1">
+<li><a href="#_contributing_patches_using_gerrit">Contributing Patches Using Gerrit</a>
+<ul class="sectlevel2">
+<li><a href="#_initial_setup_for_gerrit">Initial Setup for Gerrit</a></li>
+<li><a href="#_submitting_patches">Submitting Patches</a></li>
+<li><a href="#_abandoning_a_review">Abandoning a Review</a></li>
+<li><a href="#_reviewing_patches_in_gerrit">Reviewing Patches In Gerrit</a></li>
+</ul>
+</li>
+<li><a href="#_code_style">Code Style</a>
+<ul class="sectlevel2">
+<li><a href="#_c_code_style">C++ Code Style</a>
+<ul class="sectlevel3">
+<li><a href="#_notes_on_c_11">Notes on C++ 11</a></li>
+<li><a href="#_limitations_on_code_boost_code_use">Limitations on <code>boost</code> Use</a></li>
+<li><a href="#_line_length">Line length</a></li>
+<li><a href="#_pointers">Pointers</a></li>
+<li><a href="#_function_binding_and_callbacks">Function Binding and Callbacks</a></li>
+<li><a href="#_gflags">GFlags</a></li>
+</ul>
+</li>
+<li><a href="#_java_code_style">Java Code Style</a>
+<ul class="sectlevel3">
+<li><a href="#_preconditions_vs_assert_in_the_kudu_java_client">Preconditions vs assert in the Kudu Java client</a></li>
+</ul>
+</li>
+<li><a href="#_code_cmake_code_style_guide"><code>CMake</code> Style Guide</a></li>
+</ul>
+</li>
+<li><a href="#_testing">Testing</a></li>
+<li><a href="#_documentation">Documentation</a></li>
+<li><a href="#_blog_posts">Blog posts</a>
+<ul class="sectlevel2">
+<li><a href="#_writing_a_post_on_the_kudu_blog">Writing a post on the Kudu blog</a></li>
+<li><a href="#_submitting_a_blog_post_in_google_doc_format">Submitting a blog post in Google Doc format</a></li>
+<li><a href="#_how_to_format_a_kudu_blog_post">How to format a Kudu blog post</a></li>
+<li><a href="#_how_to_check_the_rendering_of_a_blog_post">How to check the rendering of a blog post</a></li>
+<li><a href="#_how_to_submit_a_blog_post">How to submit a blog post</a></li>
+</ul>
+</li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/developing.html
----------------------------------------------------------------------
diff --git a/docs/developing.html b/docs/developing.html
new file mode 100644
index 0000000..34b29d0
--- /dev/null
+++ b/docs/developing.html
@@ -0,0 +1,554 @@
+---
+title: Developing Applications With Apache Kudu
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-12-07 15:50:19 CET'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Developing Applications With Apache Kudu</h1>
+      <div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu provides C++, Java and Python client APIs, as well as reference examples to illustrate
+their use.</p>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+Use of server-side or private interfaces is not supported, and interfaces
+which are not part of public APIs have no stability guarantees.
+</td>
+</tr>
+</table>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_viewing_the_api_documentation"><a class="link" href="#_viewing_the_api_documentation">Viewing the API Documentation</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<div class="title">C++ API Documentation</div>
+<p>You can view the <a href="../cpp-client-api/index.html">C++ client API
+documentation</a> online. Alternatively, after
+<a href="installation.html#build_from_source">building Kudu from source</a>, you can
+additionally build the <code>doxygen</code> target (e.g., run <code>make doxygen</code> if using
+make) and use the locally generated API documentation by opening
+<code>docs/doxygen/client_api/html/index.html</code> file in your favorite Web browser.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+In order to build the <code>doxygen</code> target, it&#8217;s necessary to have
+doxygen with Dot (graphviz) support installed at your build machine. If
+you installed doxygen after building Kudu from source, you will need to run
+<code>cmake</code> again to pick up the doxygen location and generate appropriate
+targets.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<div class="title">Java API Documentation</div>
+<p>You can view the <a href="../apidocs/index.html">Java API documentation</a> online.
+Alternatively, after <a href="installation.html#build_java_client">building the Java
+client</a>, Java API documentation is available in
+<code>java/kudu-client/target/apidocs/index.html</code>.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_working_examples"><a class="link" href="#_working_examples">Working Examples</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Several example applications are provided in the
+<a href="https://github.com/apache/kudu/tree/master/examples">examples directory</a>
+of the Apache Kudu git repository. Each example includes a <code>README</code> that shows
+how to compile and run it. The following list includes some of the
+examples that are available today. Check the repository itself in case this list goes
+out of date.</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><code>cpp/example.cc</code></dt>
+<dd>
+<p>A simple C++ application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p>
+</dd>
+<dt class="hdlist1"><code>java/java-example</code></dt>
+<dd>
+<p>A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.</p>
+</dd>
+<dt class="hdlist1"><code>java/collectl</code></dt>
+<dd>
+<p>A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol.
+The commonly-available collectl tool can be used to send example data to the server.</p>
+</dd>
+<dt class="hdlist1"><code>java/insert-loadgen</code></dt>
+<dd>
+<p>A Java application that generates random insert load.</p>
+</dd>
+<dt class="hdlist1"><code>python/dstat-kudu</code></dt>
+<dd>
+<p>An example program that shows how to use the Kudu Python API to load data into a new / existing Kudu table
+generated by an external program, <code>dstat</code> in this case.</p>
+</dd>
+<dt class="hdlist1"><code>python/graphite-kudu</code></dt>
+<dd>
+<p>An example plugin for using graphite-web with Kudu as a backend.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>These examples should serve as helpful starting points for your own Kudu applications and integrations.</p>
+</div>
+<div class="sect2">
+<h3 id="_maven_artifacts"><a class="link" href="#_maven_artifacts">Maven Artifacts</a></h3>
+<div class="paragraph">
+<p>The following Maven <code>&lt;dependency&gt;</code> element is valid for the Apache Kudu public release
+(since 1.0.0):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-xml" data-lang="xml">&lt;dependency&gt;
+  &lt;groupId&gt;org.apache.kudu&lt;/groupId&gt;
+  &lt;artifactId&gt;kudu-client&lt;/artifactId&gt;
+  &lt;version&gt;1.1.0&lt;/version&gt;
+&lt;/dependency&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Convenience binary artifacts for the Java client and various Java integrations (e.g. Spark, Flume)
+are also now available via the <a href="http://repository.apache.org">ASF Maven repository</a> and
+<a href="https://mvnrepository.com/artifact/org.apache.kudu">Maven Central repository</a>.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_example_impala_commands_with_kudu"><a class="link" href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>See <a href="kudu_impala_integration.html">Using Impala With Kudu</a> for guidance on installing
+and using Impala with Kudu, including several <code>impala-shell</code> examples.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_kudu_integration_with_spark"><a class="link" href="#_kudu_integration_with_spark">Kudu Integration with Spark</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu integrates with Spark through the Data Source API as of version 1.0.0.
+Include the kudu-spark dependency using the --packages option:</p>
+</div>
+<div class="paragraph">
+<p>Use the kudu-spark_2.10 artifact if using Spark with Scala 2.10. Note that Spark 1 is no
+longer supported in Kudu starting from version 1.6.0. So in order to use Spark 1 integrated
+with Kudu, version 1.5.0 is the latest to go to.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.5.0</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Use kudu-spark2_2.11 artifact if using Spark 2 with Scala 2.11. Spark 2 artifacts are available
+up to version 1.7.0.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code>spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.7.0</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>then import kudu-spark and create a dataframe:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-scala" data-lang="scala">import org.apache.kudu.spark.kudu._
+import org.apache.kudu.client._
+import collection.JavaConverters._
+
+// Read a table from Kudu
+val df = spark.read.options(Map("kudu.master" -&gt; "kudu.master:7051",
+                                "kudu.table" -&gt; "kudu_table")).kudu
+
+// Query using the Spark API...
+df.select("id").filter("id &gt;= 5").show()
+
+// ...or register a temporary table and use SQL
+df.registerTempTable("kudu_table")
+val filteredDF = spark.sql("select id from kudu_table where id &gt;= 5").show()
+
+// Use KuduContext to create, delete, or write to Kudu tables
+val kuduContext = new KuduContext("kudu.master:7051", spark.sparkContext)
+
+// Create a new Kudu table from a dataframe schema
+// NB: No rows from the dataframe are inserted into the table
+kuduContext.createTable(
+    "test_table", df.schema, Seq("key"),
+    new CreateTableOptions()
+        .setNumReplicas(1)
+        .addHashPartitions(List("key").asJava, 3))
+
+// Insert data
+kuduContext.insertRows(df, "test_table")
+
+// Delete data
+kuduContext.deleteRows(filteredDF, "test_table")
+
+// Upsert data
+kuduContext.upsertRows(df, "test_table")
+
+// Update data
+val alteredDF = df.select("id", $"count" + 1)
+kuduContext.updateRows(filteredRows, "test_table")
+
+// Data can also be inserted into the Kudu table using the data source, though the methods on
+// KuduContext are preferred
+// NB: The default is to upsert rows; to perform standard inserts instead, set operation = insert
+// in the options map
+// NB: Only mode Append is supported
+df.write.options(Map("kudu.master"-&gt; "kudu.master:7051",
+                     "kudu.table"-&gt; "test_table")).mode("append").kudu
+
+// Check for the existence of a Kudu table
+kuduContext.tableExists("another_table")
+
+// Delete a Kudu table
+kuduContext.deleteTable("unwanted_table")</code></pre>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_upsert_option_in_kudu_spark"><a class="link" href="#_upsert_option_in_kudu_spark">Upsert option in Kudu Spark</a></h3>
+<div class="paragraph">
+<p>The upsert operation in kudu-spark supports an extra write option of <code>ignoreNull</code>. If set to true,
+it will avoid setting existing column values in Kudu table to Null if the corresponding dataframe
+column values are Null. If unspecified, <code>ignoreNull</code> is false by default.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-scala" data-lang="scala">val dataDF = spark.read.options(Map("kudu.master" -&gt; "kudu.master:7051",
+  "kudu.table" -&gt; simpleTableName)).kudu
+dataDF.registerTempTable(simpleTableName)
+dataDF.show()
+// Below is the original data in the table 'simpleTableName'
++---+---+
+|key|val|
++---+---+
+|  0|foo|
++---+---+
+
+// Upsert a row with existing key 0 and val Null with ignoreNull set to true
+val nullDF = spark.createDataFrame(Seq((0, null.asInstanceOf[String]))).toDF("key", "val")
+val wo = new KuduWriteOptions
+wo.ignoreNull = true
+kuduContext.upsertRows(nullDF, simpleTableName, wo)
+dataDF.show()
+// The val field stays unchanged
++---+---+
+|key|val|
++---+---+
+|  0|foo|
++---+---+
+
+// Upsert a row with existing key 0 and val Null with ignoreNull default/set to false
+kuduContext.upsertRows(nullDF, simpleTableName)
+// Equivalent to:
+// val wo = new KuduWriteOptions
+// wo.ignoreNull = false
+// kuduContext.upsertRows(nullDF, simpleTableName, wo)
+df.show()
+// The val field is set to Null this time
++---+----+
+|key| val|
++---+----+
+|  0|null|
++---+----+</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_using_spark_with_a_secure_kudu_cluster"><a class="link" href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></h3>
+<div class="paragraph">
+<p>The Kudu Spark integration is able to operate on secure Kudu clusters which have
+authentication and encryption enabled, but the submitter of the Spark job must
+provide the proper credentials. For Spark jobs using the default 'client' deploy
+mode, the submitting user must have an active Kerberos ticket granted through
+<code>kinit</code>. For Spark jobs using the 'cluster' deploy mode, a Kerberos principal
+name and keytab location must be provided through the <code>--principal</code> and
+<code>--keytab</code> arguments to <code>spark2-submit</code>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_spark_integration_best_practices"><a class="link" href="#_spark_integration_best_practices">Spark Integration Best Practices</a></h3>
+<div class="sect3">
+<h4 id="_avoid_multiple_kudu_clients_per_cluster"><a class="link" href="#_avoid_multiple_kudu_clients_per_cluster">Avoid multiple Kudu clients per cluster.</a></h4>
+<div class="paragraph">
+<p>One common Kudu-Spark coding error is instantiating extra <code>KuduClient</code> objects.
+In kudu-spark, a <code>KuduClient</code> is owned by the <code>KuduContext</code>. Spark application code
+should not create another <code>KuduClient</code> connecting to the same cluster. Instead,
+application code should use the <code>KuduContext</code> to access a <code>KuduClient</code> using
+<code>KuduContext#syncClient</code>.</p>
+</div>
+<div class="paragraph">
+<p>To diagnose multiple <code>KuduClient</code> instances in a Spark job, look for signs in
+the logs of the master being overloaded by many <code>GetTableLocations</code> or
+<code>GetTabletLocations</code> requests coming from different clients, usually around the
+same time. This symptom is especially likely in Spark Streaming code,
+where creating a <code>KuduClient</code> per task will result in periodic waves of master
+requests from new clients.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_spark_integration_known_issues_and_limitations"><a class="link" href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></h3>
+<div class="ulist">
+<ul>
+<li>
+<p>Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration
+is Java 7 compatible. Spark 2.2 is the default dependency version as of
+Kudu 1.5.0.</p>
+</li>
+<li>
+<p>Kudu tables with a name containing upper case or non-ascii characters must be
+assigned an alternate name when registered as a temporary table.</p>
+</li>
+<li>
+<p>Kudu tables with a column name containing upper case or non-ascii characters
+may not be used with SparkSQL. Columns may be renamed in Kudu to work around
+this issue.</p>
+</li>
+<li>
+<p><code>&lt;&gt;</code> and <code>OR</code> predicates are not pushed to Kudu, and instead will be evaluated
+by the Spark task. Only <code>LIKE</code> predicates with a suffix wildcard are pushed to
+Kudu, meaning that <code>LIKE "FOO%"</code> is pushed down but <code>LIKE "FOO%BAR"</code> isn&#8217;t.</p>
+</li>
+<li>
+<p>Kudu does not support every type supported by Spark SQL. For example,
+<code>Date</code> and complex types are not supported.</p>
+</li>
+<li>
+<p>Kudu tables may only be registered as temporary tables in SparkSQL.
+Kudu tables may not be queried using HiveContext.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_kudu_python_client"><a class="link" href="#_kudu_python_client">Kudu Python Client</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The Kudu Python client provides a Python friendly interface to the C++ client API.
+The sample below demonstrates the use of part of the Python client.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-python" data-lang="python">import kudu
+from kudu.client import Partitioning
+from datetime import datetime
+
+# Connect to Kudu master server
+client = kudu.connect(host='kudu.master', port=7051)
+
+# Define a schema for a new table
+builder = kudu.schema_builder()
+builder.add_column('key').type(kudu.int64).nullable(False).primary_key()
+builder.add_column('ts_val', type_=kudu.unixtime_micros, nullable=False, compression='lz4')
+schema = builder.build()
+
+# Define partitioning schema
+partitioning = Partitioning().add_hash_partitions(column_names=['key'], num_buckets=3)
+
+# Create new table
+client.create_table('python-example', schema, partitioning)
+
+# Open a table
+table = client.table('python-example')
+
+# Create a new session so that we can apply write operations
+session = client.new_session()
+
+# Insert a row
+op = table.new_insert({'key': 1, 'ts_val': datetime.utcnow()})
+session.apply(op)
+
+# Upsert a row
+op = table.new_upsert({'key': 2, 'ts_val': "2016-01-01T00:00:00.000000"})
+session.apply(op)
+
+# Updating a row
+op = table.new_update({'key': 1, 'ts_val': ("2017-01-01", "%Y-%m-%d")})
+session.apply(op)
+
+# Delete a row
+op = table.new_delete({'key': 2})
+session.apply(op)
+
+# Flush write operations, if failures occur, capture print them.
+try:
+    session.flush()
+except kudu.KuduBadStatus as e:
+    print(session.get_pending_errors())
+
+# Create a scanner and add a predicate
+scanner = table.scanner()
+scanner.add_predicate(table['ts_val'] == datetime(2017, 1, 1))
+
+# Open Scanner and read all tuples
+# Note: This doesn't scale for large scans
+result = scanner.open().read_all_tuples()</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_integration_with_mapreduce_yarn_and_other_frameworks"><a class="link" href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in
+the Hadoop ecosystem. See
+<a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/RowCounter.java">RowCounter.java</a>
+and
+<a href="https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportCsv.java">ImportCsv.java</a>
+for examples which you can model your own integrations on. Stay tuned for more examples
+using YARN and Spark in the future.</p>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+<span class="active-toc">Developing Applications with Kudu</span>
+            <ul class="sectlevel1">
+<li><a href="#_viewing_the_api_documentation">Viewing the API Documentation</a></li>
+<li><a href="#_working_examples">Working Examples</a>
+<ul class="sectlevel2">
+<li><a href="#_maven_artifacts">Maven Artifacts</a></li>
+</ul>
+</li>
+<li><a href="#_example_impala_commands_with_kudu">Example Impala Commands With Kudu</a></li>
+<li><a href="#_kudu_integration_with_spark">Kudu Integration with Spark</a>
+<ul class="sectlevel2">
+<li><a href="#_upsert_option_in_kudu_spark">Upsert option in Kudu Spark</a></li>
+<li><a href="#_using_spark_with_a_secure_kudu_cluster">Using Spark with a Secure Kudu Cluster</a></li>
+<li><a href="#_spark_integration_best_practices">Spark Integration Best Practices</a>
+<ul class="sectlevel3">
+<li><a href="#_avoid_multiple_kudu_clients_per_cluster">Avoid multiple Kudu clients per cluster.</a></li>
+</ul>
+</li>
+<li><a href="#_spark_integration_known_issues_and_limitations">Spark Integration Known Issues and Limitations</a></li>
+</ul>
+</li>
+<li><a href="#_kudu_python_client">Kudu Python Client</a></li>
+<li><a href="#_integration_with_mapreduce_yarn_and_other_frameworks">Integration with MapReduce, YARN, and Other Frameworks</a></li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="scaling_guide.html">Kudu Scaling Guide</a> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/export_control.html
----------------------------------------------------------------------
diff --git a/docs/export_control.html b/docs/export_control.html
new file mode 100644
index 0000000..7dbcbb3
--- /dev/null
+++ b/docs/export_control.html
@@ -0,0 +1,158 @@
+---
+title: Export Control Notice
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-10-01 15:26:31 CEST'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Export Control Notice</h1>
+      <div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>This distribution includes cryptographic software.  The country in
+which you currently reside may have restrictions on the import,
+possession, use, and/or re-export to another country, of
+encryption software.  BEFORE using any encryption software, please
+check your country&#8217;s laws, regulations and policies concerning the
+import, possession, or use, and re-export of encryption software, to
+see if this is permitted.  See <a href="http://www.wassenaar.org/" class="bare">http://www.wassenaar.org/</a> for more
+information.</p>
+</div>
+<div class="paragraph">
+<p>The U.S. Government Department of Commerce, Bureau of Industry and
+Security (BIS), has classified this software as Export Commodity
+Control Number (ECCN) 5D002.C.1, which includes information security
+software using or performing cryptographic functions with asymmetric
+algorithms.  The form and manner of this Apache Software Foundation
+distribution makes it eligible for export under the License Exception
+ENC Technology Software Unrestricted (TSU) exception (see the BIS
+Export Administration Regulations, Section 740.13) for both object
+code and source code.</p>
+</div>
+<div class="paragraph">
+<p>The following provides more details on the included cryptographic
+software:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>This software uses OpenSSL to enable TLS-encrypted connections,
+generate keys for asymmetric cryptography, and generate and
+verify signatures using those keys.</p>
+</li>
+<li>
+<p>This software uses Java SE Security libraries including the
+Java Secure Socket Extension (JSSE), Java Generic Security Service
+(JGSS), and Java Authentication and Authorization APIs (JAAS)
+to provide secure authentication and TLS-protected transport.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="scaling_guide.html">Kudu Scaling Guide</a> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+<span class="active-toc">Export Control Notice</span>
+             
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/images/hash-hash-partitioning-example.png
----------------------------------------------------------------------
diff --git a/docs/images/hash-hash-partitioning-example.png b/docs/images/hash-hash-partitioning-example.png
new file mode 100644
index 0000000..c843f73
Binary files /dev/null and b/docs/images/hash-hash-partitioning-example.png differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/images/hash-partitioning-example.png
----------------------------------------------------------------------
diff --git a/docs/images/hash-partitioning-example.png b/docs/images/hash-partitioning-example.png
new file mode 100644
index 0000000..56de4e8
Binary files /dev/null and b/docs/images/hash-partitioning-example.png differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/images/hash-range-partitioning-example.png
----------------------------------------------------------------------
diff --git a/docs/images/hash-range-partitioning-example.png b/docs/images/hash-range-partitioning-example.png
new file mode 100644
index 0000000..6e16ada
Binary files /dev/null and b/docs/images/hash-range-partitioning-example.png differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/images/kudu-architecture-2.png
----------------------------------------------------------------------
diff --git a/docs/images/kudu-architecture-2.png b/docs/images/kudu-architecture-2.png
new file mode 100644
index 0000000..fcaeba5
Binary files /dev/null and b/docs/images/kudu-architecture-2.png differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/images/range-partitioning-example.png
----------------------------------------------------------------------
diff --git a/docs/images/range-partitioning-example.png b/docs/images/range-partitioning-example.png
new file mode 100644
index 0000000..23eac01
Binary files /dev/null and b/docs/images/range-partitioning-example.png differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/index.html
----------------------------------------------------------------------
diff --git a/docs/index.html b/docs/index.html
new file mode 100644
index 0000000..ed9d88b
--- /dev/null
+++ b/docs/index.html
@@ -0,0 +1,474 @@
+---
+title: Introducing Apache Kudu
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-10-12 18:36:19 CEST'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Introducing Apache Kudu</h1>
+      <div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu is a columnar storage manager developed for the Apache Hadoop platform.  Kudu shares
+the common technical properties of Hadoop ecosystem applications: it runs on commodity
+hardware, is horizontally scalable, and supports highly available operation.</p>
+</div>
+<div class="paragraph">
+<p>Kudu&#8217;s design sets it apart. Some of Kudu&#8217;s benefits include:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Fast processing of OLAP workloads.</p>
+</li>
+<li>
+<p>Integration with MapReduce, Spark and other Hadoop ecosystem components.</p>
+</li>
+<li>
+<p>Tight integration with Apache Impala, making it a good, mutable alternative to
+using HDFS with Apache Parquet.</p>
+</li>
+<li>
+<p>Strong but flexible consistency model, allowing you to choose consistency
+requirements on a per-request basis, including the option for strict-serializable consistency.</p>
+</li>
+<li>
+<p>Strong performance for running sequential and random workloads simultaneously.</p>
+</li>
+<li>
+<p>Easy to administer and manage with Cloudera Manager.</p>
+</li>
+<li>
+<p>High availability. Tablet Servers and Masters use the <a href="#raft">Raft Consensus Algorithm</a>, which ensures that
+as long as more than half the total number of replicas is available, the tablet is available for
+reads and writes. For instance, if 2 out of 3 replicas or 3 out of 5 replicas are available, the tablet
+is available.</p>
+<div class="paragraph">
+<p>Reads can be serviced by read-only follower tablets, even in the event of a
+leader tablet failure.</p>
+</div>
+</li>
+<li>
+<p>Structured data model.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>By combining all of these properties, Kudu targets support for families of
+applications that are difficult or impossible to implement on current generation
+Hadoop storage technologies. A few examples of applications for which Kudu is a great
+solution are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Reporting applications where newly-arrived data needs to be immediately available for end users</p>
+</li>
+<li>
+<p>Time-series applications that must simultaneously support:</p>
+<div class="ulist">
+<ul>
+<li>
+<p>queries across large amounts of historic data</p>
+</li>
+<li>
+<p>granular queries about an individual entity that must return very quickly</p>
+</li>
+</ul>
+</div>
+</li>
+<li>
+<p>Applications that use predictive models to make real-time decisions with periodic
+refreshes of the predictive model based on all historic data</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For more information about these and other scenarios, see <a href="#kudu_use_cases">Example Use Cases</a>.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_kudu_impala_integration_features"><a class="link" href="#_kudu_impala_integration_features">Kudu-Impala Integration Features</a></h2>
+<div class="sectionbody">
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><code>CREATE/ALTER/DROP TABLE</code></dt>
+<dd>
+<p>Impala supports creating, altering, and dropping tables using Kudu as the persistence layer.
+The tables follow the same internal / external approach as other tables in Impala,
+allowing for flexible data ingestion and querying.</p>
+</dd>
+<dt class="hdlist1"><code>INSERT</code></dt>
+<dd>
+<p>Data can be inserted into Kudu tables in Impala using the same syntax as
+any other Impala table like those using HDFS or HBase for persistence.</p>
+</dd>
+<dt class="hdlist1"><code>UPDATE</code> / <code>DELETE</code></dt>
+<dd>
+<p>Impala supports the <code>UPDATE</code> and <code>DELETE</code> SQL commands to modify existing data in
+a Kudu table row-by-row or as a batch. The syntax of the SQL commands is chosen
+to be as compatible as possible with existing standards. In addition to simple <code>DELETE</code>
+or <code>UPDATE</code> commands, you can specify complex joins with a <code>FROM</code> clause in a subquery.</p>
+</dd>
+<dt class="hdlist1">Flexible Partitioning</dt>
+<dd>
+<p>Similar to partitioning of tables in Hive, Kudu allows you to dynamically
+pre-split tables by hash or range into a predefined number of tablets, in order
+to distribute writes and queries evenly across your cluster. You can partition by
+any number of primary key columns, by any number of hashes, and an optional list of
+split rows. See <a href="schema_design.html">Schema Design</a>.</p>
+</dd>
+<dt class="hdlist1">Parallel Scan</dt>
+<dd>
+<p>To achieve the highest possible performance on modern hardware, the Kudu client
+used by Impala parallelizes scans across multiple tablets.</p>
+</dd>
+<dt class="hdlist1">High-efficiency queries</dt>
+<dd>
+<p>Where possible, Impala pushes down predicate evaluation to Kudu, so that predicates
+are evaluated as close as possible to the data. Query performance is comparable
+to Parquet in many workloads.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>For more details regarding querying data stored in Kudu using Impala, please
+refer to the Impala documentation.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_concepts_and_terms"><a class="link" href="#_concepts_and_terms">Concepts and Terms</a></h2>
+<div class="sectionbody">
+<div id="kudu_columnar_data_store" class="paragraph">
+<div class="title">Columnar Data Store</div>
+<p>Kudu is a <em>columnar data store</em>. A columnar data store stores data in strongly-typed
+columns. With a proper design, it is superior for analytical or data warehousing
+workloads for several reasons.</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">Read Efficiency</dt>
+<dd>
+<p>For analytical queries, you can read a single column, or a portion
+of that column, while ignoring other columns. This means you can fulfill your query
+while reading a minimal number of blocks on disk. With a row-based store, you need
+to read the entire row, even if you only return values from a few columns.</p>
+</dd>
+<dt class="hdlist1">Data Compression</dt>
+<dd>
+<p>Because a given column contains only one type of data,
+pattern-based compression can be orders of magnitude more efficient than
+compressing mixed data types, which are used in row-based solutions. Combined
+with the efficiencies of reading data from columns, compression allows you to
+fulfill your query while reading even fewer blocks from disk. See
+<a href="schema_design.html#encoding">Data Compression</a></p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<div class="title">Table</div>
+<p>A <em>table</em> is where your data is stored in Kudu. A table has a schema and
+a totally ordered primary key. A table is split into segments called tablets.</p>
+</div>
+<div class="paragraph">
+<div class="title">Tablet</div>
+<p>A <em>tablet</em> is a contiguous segment of a table, similar to a <em>partition</em> in
+other data storage engines or relational databases. A given tablet is
+replicated on multiple tablet servers, and at any given point in time,
+one of these replicas is considered the leader tablet. Any replica can service
+reads, and writes require consensus among the set of tablet servers serving the tablet.</p>
+</div>
+<div class="paragraph">
+<div class="title">Tablet Server</div>
+<p>A <em>tablet server</em> stores and serves tablets to clients. For a
+given tablet, one tablet server acts as a leader, and the others act as
+follower replicas of that tablet. Only leaders service write requests, while
+leaders or followers each service read requests. Leaders are elected using
+<a href="#raft">Raft Consensus Algorithm</a>. One tablet server can serve multiple tablets, and one tablet can be served
+by multiple tablet servers.</p>
+</div>
+<div class="paragraph">
+<div class="title">Master</div>
+<p>The <em>master</em> keeps track of all the tablets, tablet servers, the
+<a href="#catalog_table">Catalog Table</a>, and other metadata related to the cluster. At a given point
+in time, there can only be one acting master (the leader). If the current leader
+disappears, a new master is elected using <a href="#raft">Raft Consensus Algorithm</a>.</p>
+</div>
+<div class="paragraph">
+<p>The master also coordinates metadata operations for clients. For example, when
+creating a new table, the client internally sends the request to the master. The
+master writes the metadata for the new table into the catalog table, and
+coordinates the process of creating tablets on the tablet servers.</p>
+</div>
+<div class="paragraph">
+<p>All the master&#8217;s data is stored in a tablet, which can be replicated to all the
+other candidate masters.</p>
+</div>
+<div class="paragraph">
+<p>Tablet servers heartbeat to the master at a set interval (the default is once
+per second).</p>
+</div>
+<div id="raft" class="paragraph">
+<div class="title">Raft Consensus Algorithm</div>
+<p>Kudu uses the <a href="https://raft.github.io/">Raft consensus algorithm</a> as
+a means to guarantee fault-tolerance and consistency, both for regular tablets and for master
+data. Through Raft, multiple replicas of a tablet elect a <em>leader</em>, which is responsible
+for accepting and replicating writes to <em>follower</em> replicas. Once a write is persisted
+in a majority of replicas it is acknowledged to the client. A given group of <code>N</code> replicas
+(usually 3 or 5) is able to accept writes with at most <code>(N - 1)/2</code> faulty replicas.</p>
+</div>
+<div id="catalog_table" class="paragraph">
+<div class="title">Catalog Table</div>
+<p>The <em>catalog table</em> is the central location for
+metadata of Kudu. It stores information about tables and tablets. The catalog
+table may not be read or written directly. Instead, it is accessible
+only via metadata operations exposed in the client API.</p>
+</div>
+<div class="paragraph">
+<p>The catalog table stores two categories of metadata:</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">Tables</dt>
+<dd>
+<p>table schemas, locations, and states</p>
+</dd>
+<dt class="hdlist1">Tablets</dt>
+<dd>
+<p>the list of existing tablets, which tablet servers have replicas of
+each tablet, the tablet&#8217;s current state, and start and end keys.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<div class="title">Logical Replication</div>
+<p>Kudu replicates operations, not on-disk data. This is referred to as <em>logical replication</em>,
+as opposed to <em>physical replication</em>. This has several advantages:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Although inserts and updates do transmit data over the network, deletes do not need
+to move any data. The delete operation is sent to each tablet server, which performs
+the delete locally.</p>
+</li>
+<li>
+<p>Physical operations, such as compaction, do not need to transmit the data over the
+network in Kudu. This is different from storage systems that use HDFS, where
+the blocks need to be transmitted over the network to fulfill the required number of
+replicas.</p>
+</li>
+<li>
+<p>Tablets do not need to perform compactions at the same time or on the same schedule,
+or otherwise remain in sync on the physical storage layer. This decreases the chances
+of all tablet servers experiencing high latency at the same time, due to compactions
+or heavy write loads.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_architectural_overview"><a class="link" href="#_architectural_overview">Architectural Overview</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The following diagram shows a Kudu cluster with three masters and multiple tablet
+servers, each serving multiple tablets. It illustrates how Raft consensus is used
+to allow for both leaders and followers for both the masters and tablet servers. In
+addition, a tablet server can be a leader for some tablets, and a follower for others.
+Leaders are shown in gold, while followers are shown in blue.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="./images/kudu-architecture-2.png" alt="Kudu Architecture" width="800">
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="kudu_use_cases"><a class="link" href="#kudu_use_cases">Example Use Cases</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<div class="title">Streaming Input with Near Real Time Availability</div>
+<p>A common challenge in data analysis is one where new data arrives rapidly and constantly,
+and the same data needs to be available in near real time for reads, scans, and
+updates. Kudu offers the powerful combination of fast inserts and updates with
+efficient columnar scans to enable real-time analytics use cases on a single storage layer.</p>
+</div>
+<div class="paragraph">
+<div class="title">Time-series application with widely varying access patterns</div>
+<p>A time-series schema is one in which data points are organized and keyed according
+to the time at which they occurred. This can be useful for investigating the
+performance of metrics over time or attempting to predict future behavior based
+on past data. For instance, time-series customer data might be used both to store
+purchase click-stream history and to predict future purchases, or for use by a
+customer support representative. While these different types of analysis are occurring,
+inserts and mutations may also be occurring individually and in bulk, and become available
+immediately to read workloads. Kudu can handle all of these access patterns
+simultaneously in a scalable and efficient manner.</p>
+</div>
+<div class="paragraph">
+<p>Kudu is a good fit for time-series workloads for several reasons. With Kudu&#8217;s support for
+hash-based partitioning, combined with its native support for compound row keys, it is
+simple to set up a table spread across many servers without the risk of "hotspotting"
+that is commonly observed when range partitioning is used. Kudu&#8217;s columnar storage engine
+is also beneficial in this context, because many time-series workloads read only a few columns,
+as opposed to the whole row.</p>
+</div>
+<div class="paragraph">
+<p>In the past, you might have needed to use multiple data stores to handle different
+data access patterns. This practice adds complexity to your application and operations,
+and duplicates your data, doubling (or worse) the amount of storage
+required. Kudu can handle all of these access patterns natively and efficiently,
+without the need to off-load work to other data stores.</p>
+</div>
+<div class="paragraph">
+<div class="title">Predictive Modeling</div>
+<p>Data scientists often develop predictive learning models from large sets of data. The
+model and the data may need to be updated or modified often as the learning takes
+place or as the situation being modeled changes. In addition, the scientist may want
+to change one or more factors in the model to see what happens over time. Updating
+a large set of data stored in files in HDFS is resource-intensive, as each file needs
+to be completely rewritten. In Kudu, updates happen in near real time. The scientist
+can tweak the value, re-run the query, and refresh the graph in seconds or minutes,
+rather than hours or days. In addition, batch or incremental algorithms can be run
+across the data at any time, with near-real-time results.</p>
+</div>
+<div class="paragraph">
+<div class="title">Combining Data In Kudu With Legacy Systems</div>
+<p>Companies generate data from multiple sources and store it in a variety of systems
+and formats. For instance, some of your data may be stored in Kudu, some in a traditional
+RDBMS, and some in files in HDFS. You can access and query all of these sources and
+formats using Impala, without the need to change your legacy systems.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_next_steps"><a class="link" href="#_next_steps">Next Steps</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p><a href="quickstart.html">Get Started With Kudu</a></p>
+</li>
+<li>
+<p><a href="installation.html">Installing Kudu</a></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+<span class="active-toc">Introducing Kudu</span>
+            <ul class="sectlevel1">
+<li><a href="#_kudu_impala_integration_features">Kudu-Impala Integration Features</a></li>
+<li><a href="#_concepts_and_terms">Concepts and Terms</a></li>
+<li><a href="#_architectural_overview">Architectural Overview</a></li>
+<li><a href="#kudu_use_cases">Example Use Cases</a></li>
+<li><a href="#_next_steps">Next Steps</a></li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="scaling_guide.html">Kudu Scaling Guide</a> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file


Mime
View raw message