kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abu...@apache.org
Subject [35/52] [abbrv] [partial] kudu git commit: Updating web site for Kudu 1.8.0 release
Date Fri, 26 Oct 2018 18:57:28 GMT
http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
deleted file mode 100644
index 7932b9a..0000000
--- a/docs/quickstart.html
+++ /dev/null
@@ -1,431 +0,0 @@
----
-title: Apache Kudu Quickstart
-layout: default
-active_nav: docs
-last_updated: 'Last updated 2018-06-14 08:17:56 PDT'
----
-<!--
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-
-
-<div class="container">
-  <div class="row">
-    <div class="col-md-9">
-
-<h1>Apache Kudu Quickstart</h1>
-      <div id="preamble">
-<div class="sectionbody">
-<div class="paragraph">
-<p>Follow these instructions to set up and run the Kudu VM, and start with Kudu, Kudu_Impala,
-and CDH in minutes.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="quickstart_vm"><a class="link" href="#quickstart_vm">Get The Kudu Quickstart VM</a></h2>
-<div class="sectionbody">
-<div class="sect2">
-<h3 id="_prerequisites"><a class="link" href="#_prerequisites">Prerequisites</a></h3>
-<div class="olist arabic">
-<ol class="arabic">
-<li>
-<p>Install <a href="https://www.virtualbox.org/">Oracle Virtualbox</a>. The VM has been tested to work
-with VirtualBox version 4.3 on Ubuntu 14.04 and VirtualBox version 5 on OSX
-10.9. VirtualBox is also included in most package managers: apt-get, brew, etc.</p>
-</li>
-<li>
-<p>After the installation, make sure that <code>VBoxManage</code> is in your <code>PATH</code> by using the
-<code>which VBoxManage</code> command.</p>
-</li>
-</ol>
-</div>
-</div>
-<div class="sect2">
-<h3 id="_installation"><a class="link" href="#_installation">Installation</a></h3>
-<div class="paragraph">
-<p>To download and start the VM, execute the following command in a terminal window.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">$ curl -s https://raw.githubusercontent.com/cloudera/kudu-examples/master/demo-vm-setup/bootstrap.sh | bash</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>This command downloads a shell script which clones the <code>kudu-examples</code> Git repository and
-then downloads a VM image of about 1.2GB size into the current working
-directory.<sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnote_1" title="View footnote.">1</a>]</sup> You can examine the script after downloading it by removing
-the <code>| bash</code> component of the command above. Once the setup is complete, you can verify
-that everything works by connecting to the guest via SSH:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">$ ssh demo@quickstart.cloudera</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>The username and password for the demo account are both <code>demo</code>. In addition, the <code>demo</code>
-user has password-less <code>sudo</code> privileges so that you can install additional software or
-manage the guest OS. You can also access the <code>kudu-examples</code> as a shared folder in
-<code>/home/demo/kudu-examples/</code> on the guest or from your VirtualBox shared folder location on
-the host. This is a quick way to make scripts or data visible to the guest.</p>
-</div>
-<div class="paragraph">
-<p>You can quickly verify if Kudu and Impala are running by executing the following commands:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">$ ps aux | grep kudu
-$ ps aux | grep impalad</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>If you have issues connecting to the VM or one of the processes is not running, make sure
-to consult the <a href="#trouble">Troubleshooting</a> section.</p>
-</div>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_load_data"><a class="link" href="#_load_data">Load Data</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>To practice some typical operations with Kudu and Impala, we&#8217;ll use the
-<a href="https://data.sfgov.org/Transportation/Raw-AVL-GPS-data/5fk7-ivit/data">San Francisco MTA
-GPS dataset</a>. This dataset contains raw location data transmitted periodically from
-sensors installed on the buses in the SF MTA&#8217;s fleet.</p>
-</div>
-<div class="olist arabic">
-<ol class="arabic">
-<li>
-<p>Download the sample data and load it into HDFS</p>
-<div class="paragraph">
-<p>First we&#8217;ll download the sample dataset, prepare it, and upload it into the HDFS
-cluster.</p>
-</div>
-<div class="paragraph">
-<p>The SF MTA&#8217;s site is often a bit slow, so we&#8217;ve mirrored a sample CSV file from the
-dataset at <a href="http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz" class="bare">http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz</a></p>
-</div>
-<div class="paragraph">
-<p>The original dataset uses DOS-style line endings, so we&#8217;ll convert it to
-UNIX-style during the upload process using <code>tr</code>.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">$ wget http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz
-$ hdfs dfs -mkdir /sfmta
-$ zcat sfmtaAVLRawData01012013.csv.gz | tr -d '\r' | hadoop fs -put - /sfmta/data.csv</code></pre>
-</div>
-</div>
-</li>
-<li>
-<p>Create a new external Impala table to access the plain text data. To connect to Impala
-in the virtual machine issue the following command:</p>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">ssh demo@quickstart.cloudera -t impala-shell</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Now, you can execute the following commands:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE EXTERNAL TABLE sfmta_raw (
-  revision int,
-  report_time string,
-  vehicle_tag int,
-  longitude float,
-  latitude float,
-  speed float,
-  heading float
-)
-ROW FORMAT DELIMITED
-FIELDS TERMINATED BY ','
-LOCATION '/sfmta/'
-TBLPROPERTIES ('skip.header.line.count'='1');</code></pre>
-</div>
-</div>
-</li>
-<li>
-<p>Validate if the data was actually loaded run the following command:</p>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">SELECT count(*) FROM sfmta_raw;
-
-+----------+
-| count(*) |
-+----------+
-| 859086   |
-+----------+</code></pre>
-</div>
-</div>
-</li>
-<li>
-<p>Next we&#8217;ll create a Kudu table and load the data. Note that we convert
-the string <code>report_time</code> field into a unix-style timestamp for more efficient
-storage.</p>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE sfmta
-PRIMARY KEY (report_time, vehicle_tag)
-PARTITION BY HASH(report_time) PARTITIONS 8
-STORED AS KUDU
-AS SELECT
-  UNIX_TIMESTAMP(report_time,  'MM/dd/yyyy HH:mm:ss') AS report_time,
-  vehicle_tag,
-  longitude,
-  latitude,
-  speed,
-  heading
-FROM sfmta_raw;
-
-+------------------------+
-| summary                |
-+------------------------+
-| Inserted 859086 row(s) |
-+------------------------+
-Fetched 1 row(s) in 5.75s</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>The created table uses a composite primary key. See
-<a href="kudu_impala_integration.html#kudu_impala">Kudu Impala Integration</a> for a more detailed
-introduction to the extended SQL syntax for Impala.</p>
-</div>
-</li>
-</ol>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_read_and_modify_data"><a class="link" href="#_read_and_modify_data">Read and Modify Data</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Now that the data is stored in Kudu, you can run queries against it. The following query
-finds the data point containing the highest recorded vehicle speed.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">SELECT * FROM sfmta ORDER BY speed DESC LIMIT 1;
-
-+-------------+-------------+--------------------+-------------------+-------------------+---------+
-| report_time | vehicle_tag | longitude          | latitude          | speed             | heading |
-+-------------+-------------+--------------------+-------------------+-------------------+---------+
-| 1357022342  | 5411        | -122.3968811035156 | 37.76665878295898 | 68.33300018310547 | 82      |
-+-------------+-------------+--------------------+-------------------+-------------------+---------+</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>With a quick <a href="https://www.google.com/search?q=122.3968811035156W+37.76665878295898N">Google search</a>
-we can see that this bus was traveling east on 16th street at 68MPH.
-At first glance, this seems unlikely to be true. Perhaps we do some research
-and find that this bus&#8217;s sensor equipment was broken and we decide to
-remove the data. With Kudu this is very easy to correct using standard
-SQL:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-sql" data-lang="sql">DELETE FROM sfmta WHERE vehicle_tag = '5411';
-
--- Modified 1169 row(s), 0 row error(s) in 0.25s</code></pre>
-</div>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_next_steps"><a class="link" href="#_next_steps">Next steps</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>The above example showed how to load, query, and mutate a static dataset with Impala
-and Kudu. The real power of Kudu, however, is the ability to ingest and mutate data
-in a streaming fashion.</p>
-</div>
-<div class="paragraph">
-<p>As an exercise to learn the Kudu programmatic APIs, try implementing a program
-that uses the <a href="http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf">SFMTA
-XML data feed</a> to ingest this same dataset in real time into the Kudu table.</p>
-</div>
-<div class="sect2">
-<h3 id="trouble"><a class="link" href="#trouble">Troubleshooting</a></h3>
-<div class="sect3">
-<h4 id="_problems_accessing_the_vm_via_ssh"><a class="link" href="#_problems_accessing_the_vm_via_ssh">Problems accessing the VM via SSH</a></h4>
-<div class="ulist">
-<ul>
-<li>
-<p>Make sure the host has a SSH client installed.</p>
-</li>
-<li>
-<p>Make sure the VM is running, by running the following command and checking for a VM called <code>kudu-demo</code>:</p>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">$ VBoxManage list runningvms</code></pre>
-</div>
-</div>
-</li>
-<li>
-<p>Verify that the VM&#8217;s IP address is included in the host&#8217;s <code>/etc/hosts</code> file. You should
-see a line that includes an IP address followed by the hostname
-<code>quickstart.cloudera</code>. To check the running VM&#8217;s IP address, use the <code>VBoxManage</code>
-command below.</p>
-<div class="listingblock">
-<div class="content">
-<pre class="highlight"><code class="language-bash" data-lang="bash">$ VBoxManage guestproperty get kudu-demo /VirtualBox/GuestInfo/Net/0/V4/IP
-Value: 192.168.56.100</code></pre>
-</div>
-</div>
-</li>
-<li>
-<p>If you&#8217;ve used a Cloudera Quickstart VM before, your <code>.ssh/known_hosts</code> file may
-contain references to the previous VM&#8217;s SSH credentials. Remove any references to
-<code>quickstart.cloudera</code> from this file.</p>
-</li>
-</ul>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_failing_with_lack_of_sse4_2_support_when_running_inside_virtualbox"><a class="link" href="#_failing_with_lack_of_sse4_2_support_when_running_inside_virtualbox">Failing with lack of SSE4.2 support when running inside VirtualBox</a></h4>
-<div class="ulist">
-<ul>
-<li>
-<p>Running Kudu currently requires a CPU that supports SSE4.2 (Nehalem or later for Intel). To pass through SSE4.2 support into the guest VM, refer to the <a href="https://www.virtualbox.org/manual/ch09.html#sse412passthrough">VirtualBox documentation</a></p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_next_steps_2"><a class="link" href="#_next_steps_2">Next Steps</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p><a href="installation.html">Installing Kudu</a></p>
-</li>
-<li>
-<p><a href="configuration.html">Configuring Kudu</a></p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-    </div>
-    <div class="col-md-3">
-
-  <div id="toc" data-spy="affix" data-offset-top="70">
-  <ul>
-
-      <li>
-
-          <a href="index.html">Introducing Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="release_notes.html">Kudu Release Notes</a> 
-      </li> 
-      <li>
-<span class="active-toc">Getting Started with Kudu</span>
-            <ul class="sectlevel1">
-<li><a href="#quickstart_vm">Get The Kudu Quickstart VM</a>
-<ul class="sectlevel2">
-<li><a href="#_prerequisites">Prerequisites</a></li>
-<li><a href="#_installation">Installation</a></li>
-</ul>
-</li>
-<li><a href="#_load_data">Load Data</a></li>
-<li><a href="#_read_and_modify_data">Read and Modify Data</a></li>
-<li><a href="#_next_steps">Next steps</a>
-<ul class="sectlevel2">
-<li><a href="#trouble">Troubleshooting</a></li>
-</ul>
-</li>
-<li><a href="#_next_steps_2">Next Steps</a></li>
-</ul> 
-      </li> 
-      <li>
-
-          <a href="installation.html">Installation Guide</a> 
-      </li> 
-      <li>
-
-          <a href="configuration.html">Configuring Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="administration.html">Administering Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="developing.html">Developing Applications with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="schema_design.html">Kudu Schema Design</a> 
-      </li> 
-      <li>
-
-          <a href="security.html">Kudu Security</a> 
-      </li> 
-      <li>
-
-          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
-      </li> 
-      <li>
-
-          <a href="background_tasks.html">Background Maintenance Tasks</a> 
-      </li> 
-      <li>
-
-          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
-      </li> 
-      <li>
-
-          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
-      </li> 
-      <li>
-
-          <a href="known_issues.html">Known Issues and Limitations</a> 
-      </li> 
-      <li>
-
-          <a href="contributing.html">Contributing to Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="export_control.html">Export Control Notice</a> 
-      </li> 
-  </ul>
-  </div>
-    </div>
-  </div>
-</div>
-
-
-  <div id="footnotes">
-  <hr>
-      <div class="footnote" id="_footnote_1">
-      <a href="#_footnoteref_1">1</a>. In addition, the script will create a host-only network between host and guest and setup an entry in the <code>/etc/hosts</code> file with the name <code>quickstart.cloudera</code> and the guest&#8217;s IP address.
-      </div>
-  </div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/release_notes.html
----------------------------------------------------------------------
diff --git a/docs/release_notes.html b/docs/release_notes.html
deleted file mode 100644
index fe866a6..0000000
--- a/docs/release_notes.html
+++ /dev/null
@@ -1,662 +0,0 @@
----
-title: Apache Kudu 1.7.1 Release Notes
-layout: default
-active_nav: docs
-last_updated: 'Last updated 2018-06-15 07:22:05 PDT'
----
-<!--
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-
-
-<div class="container">
-  <div class="row">
-    <div class="col-md-9">
-
-<h1>Apache Kudu 1.7.1 Release Notes</h1>
-      <div class="sect1">
-<h2 id="rn_1.7.1_fixed_issues"><a class="link" href="#rn_1.7.1_fixed_issues">Fixed Issues</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Apache Kudu 1.7.1 is a bug-fix release which fixes critical issues in Kudu 1.7.0.</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>Fixed and issue where a leader replica could report a follower&#8217;s health status
-as FAILED instead of FAILED_UNRECOVERABLE. In configurations where the tablet
-replication factor equals to the total number of tablet servers in the cluster,
-that lead to situations where the tablet could not be automatically recovered
-until a new leader was elected or corresponding tablet servers were restarted.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2367">KUDU-2367</a>).</p>
-</li>
-<li>
-<p>Fixed an issue where Kudu would fail to start if RLIMIT_NPROC was set to -1.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2377">KUDU-2377</a>).</p>
-</li>
-<li>
-<p>Fixed an issue where <code>kudu-spark</code> was unable to connect to secure clusters.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2379">KUDU-2379</a>).</p>
-</li>
-<li>
-<p>Fixed an issue where the <code>kudu-python</code> client would not compile in environments
-where <code>__int128</code> is not supported. This was most commonly el6 environments.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2412">KUDU-2412</a>).</p>
-</li>
-<li>
-<p>Fixed an issue where unaligned loads of <code>__int128</code> integers could result
-in a crash.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2378">KUDU-2378</a>).</p>
-</li>
-<li>
-<p>Fixed a bug in <code>PartialRow.setMin</code> that could lead to incorrect partition
-pruning when a <code>decimal</code> column is part of the tables range partition but
-not a part of the query predicate.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2416">KUDU-2416</a>).</p>
-</li>
-<li>
-<p>Fixed an equality check on <code>decimal</code> column predicates that could result
-in pruning that is too conservative.</p>
-</li>
-<li>
-<p>Fixed an issue where ColumnSchema.toString() would throw a
-NullPointerException on non-decimal types.</p>
-</li>
-<li>
-<p>Added an optimization that improves the performance when scanning tables
-with large consecutive runs of deleted rows. For example, users may use
-'DELETE' all rows in a table or partition before re-adding them, or they
-may delete all data corresponding to some prefix of the PK.</p>
-</li>
-<li>
-<p>Fixed an issue where moving single-replica tablets via
-<code>kudu tablet change_config move_replica</code> does not work.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2443">KUDU-2443</a>).</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<h1 id="rn_1.7.0_release_notes" class="sect0"><a class="link" href="#rn_1.7.0_release_notes">Apache Kudu 1.7.0 Release Notes</a></h1>
-<div class="sect1">
-<h2 id="rn_1.7.0_upgrade_notes"><a class="link" href="#rn_1.7.0_upgrade_notes">Upgrade Notes</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>Upgrading directly from Kudu 1.6.0 is supported and no special upgrade steps
-are required. A rolling upgrade of the server side will <em>not</em> work because
-the default replica management scheme changed, and running masters and tablet
-servers with different replica management schemes is not supported, see
-<a href="#rn_1.7.0_incompatible_changes">Incompatible Changes in Kudu 1.7.0</a> for details. However, mixing client and
-server sides of different versions is not a problem. You can still
-update your clients before your servers or vice versa.
-When upgrading to Kudu 1.7, it is required to first shut down all Kudu processes
-across the cluster, then upgrade the software on all servers, then restart
-the Kudu processes on all servers in the cluster.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_obsoletions"><a class="link" href="#rn_1.7.0_obsoletions">Obsoletions</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>The <code>tcmalloc_contention_time</code> metric, which previously tracked the amount
-of time spent in memory allocator lock contention, has been removed.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_deprecations"><a class="link" href="#rn_1.7.0_deprecations">Deprecations</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>Support for Java 7 has been deprecated since Kudu 1.5.0 and may be removed in
-the next major release.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_new_features"><a class="link" href="#rn_1.7.0_new_features">New features</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>Kudu now supports the decimal column type. The decimal type is a numeric data type
-with fixed scale and precision suitable for financial and other arithmetic
-calculations where the imprecise representation and rounding behavior of float and
-double make those types impractical. The decimal type is also useful for integers
-larger than int64 and cases with fractional values in a primary key.
-See <a href="schema_design.html#decimal">Decimal Type</a> for more details.</p>
-</li>
-<li>
-<p>The strategy Kudu uses for automatically healing tablets which have lost a
-replica due to server or disk failures has been improved. The new re-replication
-strategy, or replica management scheme, first adds a replacement tablet replica
-before evicting the failed one. With the previous replica management scheme,
-the system first evicts the failed replica and then adds a replacement. The new
-replica management scheme allows for much faster recovery of tablets in
-scenarios where one tablet server goes down and then returns back shortly after
-5 minutes or so. The new scheme also provides substantially better overall
-stability on clusters with frequent server failures.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-1097">KUDU-1097</a>).</p>
-</li>
-<li>
-<p>The <code>kudu fs update_dirs</code> tool now supports removing directories. Unless the
-<code>--force</code> flag is specified, Kudu will not allow the removal of a directory
-across which tablets are configured to spread data. If specified, all tablet
-replicas configured to use that directory will fail upon starting up and be
-replicated elsewhere, provided a majority exists elsewhere.</p>
-</li>
-<li>
-<p>Users can use the new <code>--fs_metadata_dir</code> to specify the directory in which
-to place tablet-specific metadata. It is recommended, although not
-necessary, that this be placed on a high-performance drive with high
-bandwidth and low latency, e.g. a solid-state drive. If not specified,
-metadata will be placed in the directory specified by <code>--fs_wal_dir</code>, or in
-the directory specified by the first entry of <code>--fs_data_dirs</code> if metadata
-already exists there from a pre-Kudu 1.7 deployment. Kudu will not
-automatically move existing metadata based on this configuration.</p>
-</li>
-<li>
-<p>Kudu 1.7 introduces a new scan read mode READ_YOUR_WRITES. Users can specify
-READ_YOUR_WRITES when creating a new scanner in C++, Java and Python clients.
-If this mode is used, the client will perform a read such that it follows all
-previously known writes and reads from this client. Reads in this mode ensure
-read-your-writes and read-your-reads session guarantees, while minimizing
-latency caused by waiting for outstanding write transactions to complete.
-Note that this is still an experimental feature which may be stabilized in
-future releases.</p>
-</li>
-<li>
-<p>The tablet server web UI scans dashboard (/scans) has been improved with
-several new features, including: showing the most recently completed scans,
-a pseudo-SQL scan descriptor that concisely shows the selected columns and
-applied predicates, and more complete and better documented scan statistics.</p>
-</li>
-<li>
-<p>Kudu daemons now expose a web page <code>/stacks</code> which dumps the current stack
-trace of every thread running in the server. This information can be helpful
-when diagnosing performance issues.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_optimizations_and_improvements"><a class="link" href="#_optimizations_and_improvements">Optimizations and improvements</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>By default, each tablet replica will now stripe data blocks across 3 data
-directories instead of all data directories. This decreases the likelihood
-that any given tablet will be affected in the event of a single disk failure.
-No substantial performance impact is expected due to this feature based on
-<a href="https://github.com/apache/kudu/commit/60276c54a221d554287c6645df7df542fe6d6443">performance testing</a>.
-This change only affects new replicas created after upgrading to Kudu 1.7.</p>
-</li>
-<li>
-<p>Kudu servers previously offered the ability to enable a separate metrics log
-which stores periodic snapshots of all metrics available on a server. This
-functionality is now available as part of a more general “diagnostics log”
-which is enabled by default. The diagnostics log includes periodic dumps of
-server metrics as well as collections of thread stack traces. The default
-configuration ensures that no more than 640MB of diagnostics logs are retained,
-and typically the space consumption is significantly less due to compression.
-The format and contents of this log file are documented in the
-<a href="administration.html">Administration guide</a>.</p>
-</li>
-<li>
-<p>The handling of errors in the synchronous Java client has been improved so that,
-when an exception is thrown, the stack trace indicates the correct location
-where the client function was invoked rather than a call stack of an internal
-worker thread. The original call stack from the worker thread is available as
-a “suppressed exception”.</p>
-</li>
-<li>
-<p>The logging of errors in the Java client has been improved to exclude exception
-stack traces for expected scenarios such as failure to connect to a server in a
-cluster. Instead, only a single line informational message will be logged in
-such cases to aid in debugging.</p>
-</li>
-<li>
-<p>The Java client now uses a predefined prioritized list of TLS ciphers when
-establishing an encrypted connection to Kudu servers. This cipher list matches
-the list of ciphers preferred for server-to-server communication and ensures
-that the most efficient and secure ciphers are preferred. When the Kudu client
-is running on Java 8 or newer, this provides a substantial speed-up to read
-and write performance.</p>
-</li>
-<li>
-<p>Reporting for the <code>kudu cluster ksck</code> tool has been updated so tablets and
-tables with on-going tablet copies are shown as "recovering". Additional
-reporting changes have been made to make various common scenarios,
-particularly tablet copies, less alarming.</p>
-</li>
-<li>
-<p>The performance of inserting rows containing many string or binary columns has
-been improved, especially in the case of highly concurrent write workloads.</p>
-</li>
-<li>
-<p>By default, Spark tasks that scan Kudu will now be able to scan non-leader
-replicas. This allows Spark to more easily schedule kudu-spark tasks local to
-the data. Users can disable this behavior by passing 'leader_only' to the
-'kudu.scanLocality' option."</p>
-</li>
-<li>
-<p>The number of OS threads used in the steady state and during bursts of
-activity (such as in Raft leader elections triggered by a node failure) has
-been drastically reduced and should no longer exceed the value of <code>ulimit -u</code>.
-As such, it should no longer be necessary to increase the value of <code>ulimit -u</code>
-(or of /proc/sys/kernel/threads-max) in order to run a Kudu tablet server in
-most cases.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-1913">KUDU-1913</a>).</p>
-</li>
-<li>
-<p>An issue where sparse column predicates could cause excessive data-block reads
-has been fixed. Previously in certain scans with sparsely matching predicates
-on multiple columns, Kudu would read and decode the same data blocks many times.
-The improvement typically results in a 5-10x performance increase for the
-affected scans.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2231">KUDU-2231</a>).</p>
-</li>
-<li>
-<p>The efficiency and on-disk size of large updated values has been improved.
-This will improve update-heavy workloads which overwrite large (1KiB+) values.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2253">KUDU-2253</a>).</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_fixed_issues"><a class="link" href="#rn_1.7.0_fixed_issues">Fixed Issues</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>Fixed a scenario where the on-disk data of a tablet server was completely
-erased and and a new tablet server was started on the same host. This issue
-could prevent tablet replicas previously hosted on the server from being
-evicted and re-replicated.
-Tablets now immediately evict replicas that respond with a different server
-UUID than expected.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-1613">KUDU-1613</a>).</p>
-</li>
-<li>
-<p>Fixed a rare race condition when connecting to masters during their
-startup which might cause a client to get a response without a CA certificate
-and/or authentication token. This would cause the client to fail to authenticate
-with other servers in the cluster. The leader master now always sends a CA
-certificate and an authentication token (when applicable) to a Kudu client
-with a successful ConnectToMaster response.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-1927">KUDU-1927</a>).</p>
-</li>
-<li>
-<p>The Kudu Java client now will retry a connection if no master is discovered as a
-leader, and the user has a valid authentication token. This avoids failure
-in recoverable cases when masters are in the process of the very first leader
-election after starting up.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2262">KUDU-2262</a>).</p>
-</li>
-<li>
-<p>The Java client will now automatically attempt to re-acquire Kerberos
-credentials from the ticket cache when the prior credentials are about to
-expire. This allows client instances to persist longer than the expiration
-time of a single Kerberos ticket so long as some other process renews the
-credentials in the ticket cache. Documentation on interacting with Kerberos
-authentication has been added to the Javadoc for the <code>AsyncKuduClient</code> class.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2264">KUDU-2264</a>).</p>
-</li>
-<li>
-<p>Follower masters are now able to verify authentication tokens even if they have never
-been a leader. Prior to this fix, if a follower master had never been a leader,
-clients would be unable to authenticate to that master, resulting in spurious
-error messages being logged.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2265">KUDU-2265</a>).</p>
-</li>
-<li>
-<p>Fixed a tablet server crash when a tablet replica is deleted during a scan.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2295">KUDU-2295</a>).</p>
-</li>
-<li>
-<p>The evaluation order of predicates in scans with multiple predicates has been
-made deterministic. Due to a bug, this was not necessarily the case previously.
-Predicates are applied in most to least selective order, with ties broken by
-column index. The evaluation order may change in the future, particularly when
-better column statistics are made available internally.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2312">KUDU-2312</a>).</p>
-</li>
-<li>
-<p>Previously, the <code>kudu tablet change_config move_replica</code> tool required all
-tablet servers in the cluster to be available when performing a move. This
-restriction has been relaxed: only the tablet server that will receive a replica
-of the tablet being moved and the hosts of the tablet&#8217;s existing replicas need to be
-available for the move to occur.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2331">KUDU-2331</a>).</p>
-</li>
-<li>
-<p>Fixed a bug in the Java client which prevented the client from locating the
-new leader master after a leader failover in the case that the previous leader
-either remained online or restarted quickly. This bug resulted in the client
-timing out operations with errors indicating that there was no leader master.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2343">KUDU-2343</a>).</p>
-</li>
-<li>
-<p>The Unix process username of the client is now included inside the exported
-security credentials, so that the effective username of clients who import
-credentials and subsequently use unauthenticated (SASL PLAIN) connections
-matches the client who exported the security credentials. For example, this is
-useful to let the Spark executors know which username to use if the Spark
-driver has no authentication token. This change only affects clusters with
-encryption disabled using <code>--rpc-encryption=disabled</code>.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2259">KUDU-2259</a>).</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_wire_compatibility"><a class="link" href="#rn_1.7.0_wire_compatibility">Wire Protocol compatibility</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Kudu 1.7.0 is wire-compatible with previous versions of Kudu:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>Kudu 1.7 clients may connect to servers running Kudu 1.0 or later. If the client uses
-features that are not available on the target server, an error will be returned.</p>
-</li>
-<li>
-<p>Rolling upgrade between Kudu 1.6 and Kudu 1.7 servers is believed to be possible
-though has not been sufficiently tested. Users are encouraged to shut down all nodes
-in the cluster, upgrade the software, and then restart the daemons on the new version.</p>
-</li>
-<li>
-<p>Kudu 1.0 clients may connect to servers running Kudu 1.7 with the exception of the
-below-mentioned restrictions regarding secure clusters.</p>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>The authentication features introduced in Kudu 1.3 place the following limitations
-on wire compatibility between Kudu 1.7 and versions earlier than 1.3:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>If a Kudu 1.7 cluster is configured with authentication or encryption set to "required",
-clients older than Kudu 1.3 will be unable to connect.</p>
-</li>
-<li>
-<p>If a Kudu 1.7 cluster is configured with authentication and encryption set to "optional"
-or "disabled", older clients will still be able to connect.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_incompatible_changes"><a class="link" href="#rn_1.7.0_incompatible_changes">Incompatible Changes in Kudu 1.7.0</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p>The newly introduced replica management scheme is not compatible with the
-old scheme, so it&#8217;s not possible to run pre-1.7 Kudu masters with
-1.7 Kudu tablet servers or vice versa. This is a server-side
-incompatibility only and it does not affect client compatibility. In other words,
-Kudu clients of prior versions are compatible with upgraded Kudu clusters.</p>
-<div class="ulist">
-<ul>
-<li>
-<p>Kudu masters of 1.7 version will not register Kudu tablet servers of 1.6
-and prior versions.</p>
-</li>
-<li>
-<p>Kudu tablet servers of 1.7 version will not work with Kudu masters of 1.6
-and prior versions.</p>
-</li>
-</ul>
-</div>
-</li>
-<li>
-<p>The format of the previously-optional metrics log has changed to include a
-human-readable timestamp on each line. The path of the log file has also
-changed with the word “diagnostics” replacing the word “metrics” in the file
-name. The metrics log has been optimized to only include those metrics which
-have changed in between successive samples, and to not include entity attributes
-such as tablet partition information in the log.
-(see <a href="https://issues.apache.org/jira/browse/KUDU-2297">KUDU-2297</a>).</p>
-</li>
-</ul>
-</div>
-<div class="sect2">
-<h3 id="rn_1.7.0_client_compatibility"><a class="link" href="#rn_1.7.0_client_compatibility">Client Library Compatibility</a></h3>
-<div class="ulist">
-<ul>
-<li>
-<p>The Kudu 1.7 Java client library is API- and ABI-compatible with Kudu 1.6. Applications
-written against Kudu 1.6 will compile and run against the Kudu 1.7 client library and
-vice-versa.</p>
-</li>
-<li>
-<p>The Kudu 1.7 C++ client is API- and ABI-forward-compatible with Kudu 1.6.
-Applications written and compiled against the Kudu 1.6 client library will run without
-modification against the Kudu 1.7 client library. Applications written and compiled
-against the Kudu 1.7 client library will run without modification against the Kudu 1.6
-client library.</p>
-</li>
-<li>
-<p>The Kudu 1.7 Python client is API-compatible with Kudu 1.6. Applications
-written against Kudu 1.6 will continue to run against the Kudu 1.7 client
-and vice-versa.</p>
-</li>
-<li>
-<p>Kudu 1.7 clients that attempt to create a table with a decimal column on a
-target server running Kudu 1.6 or earlier will receive an error response.
-Similarly Kudu clients running Kudu 1.6 or earlier will result in an error
-when attempting to access any table containing containing a decimal
-column.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_known_issues"><a class="link" href="#rn_1.7.0_known_issues">Known Issues and Limitations</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Please refer to the <a href="known_issues.html">Known Issues and Limitations</a> section of the
-documentation.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="rn_1.7.0_contributors"><a class="link" href="#rn_1.7.0_contributors">Contributors</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Kudu 1.7 includes contributions from 22 people, including two first-time
-contributors, Clemens Valiente and Tsuyoshi Ozawa.</p>
-</div>
-<div class="paragraph">
-<p>Thank you for helping to make Kudu even better!</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="resources_and_next_steps"><a class="link" href="#resources_and_next_steps">Resources</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p><a href="http://kudu.apache.org">Kudu Website</a></p>
-</li>
-<li>
-<p><a href="http://github.com/apache/kudu">Kudu GitHub Repository</a></p>
-</li>
-<li>
-<p><a href="index.html">Kudu Documentation</a></p>
-</li>
-<li>
-<p><a href="prior_release_notes.html">Release notes for older releases</a></p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_installation_options"><a class="link" href="#_installation_options">Installation Options</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>For full installation details, see <a href="installation.html">Kudu Installation</a>.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_next_steps"><a class="link" href="#_next_steps">Next Steps</a></h2>
-<div class="sectionbody">
-<div class="ulist">
-<ul>
-<li>
-<p><a href="quickstart.html">Kudu Quickstart</a></p>
-</li>
-<li>
-<p><a href="installation.html">Installing Kudu</a></p>
-</li>
-<li>
-<p><a href="configuration.html">Configuring Kudu</a></p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-    </div>
-    <div class="col-md-3">
-
-  <div id="toc" data-spy="affix" data-offset-top="70">
-  <ul>
-
-      <li>
-
-          <a href="index.html">Introducing Kudu</a> 
-      </li> 
-      <li>
-<span class="active-toc">Kudu Release Notes</span>
-            <ul class="sectlevel1">
-<li><a href="#rn_1.7.1_fixed_issues">Fixed Issues</a></li>
-<li><a href="#rn_1.7.0_release_notes">Apache Kudu 1.7.0 Release Notes</a>
-<ul class="sectlevel1">
-<li><a href="#rn_1.7.0_upgrade_notes">Upgrade Notes</a></li>
-<li><a href="#rn_1.7.0_obsoletions">Obsoletions</a></li>
-<li><a href="#rn_1.7.0_deprecations">Deprecations</a></li>
-<li><a href="#rn_1.7.0_new_features">New features</a></li>
-<li><a href="#_optimizations_and_improvements">Optimizations and improvements</a></li>
-<li><a href="#rn_1.7.0_fixed_issues">Fixed Issues</a></li>
-<li><a href="#rn_1.7.0_wire_compatibility">Wire Protocol compatibility</a></li>
-<li><a href="#rn_1.7.0_incompatible_changes">Incompatible Changes in Kudu 1.7.0</a>
-<ul class="sectlevel2">
-<li><a href="#rn_1.7.0_client_compatibility">Client Library Compatibility</a></li>
-</ul>
-</li>
-<li><a href="#rn_1.7.0_known_issues">Known Issues and Limitations</a></li>
-<li><a href="#rn_1.7.0_contributors">Contributors</a></li>
-<li><a href="#resources_and_next_steps">Resources</a></li>
-<li><a href="#_installation_options">Installation Options</a></li>
-<li><a href="#_next_steps">Next Steps</a></li>
-</ul>
-</li>
-</ul> 
-      </li> 
-      <li>
-
-          <a href="quickstart.html">Getting Started with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="installation.html">Installation Guide</a> 
-      </li> 
-      <li>
-
-          <a href="configuration.html">Configuring Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="administration.html">Administering Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="developing.html">Developing Applications with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="schema_design.html">Kudu Schema Design</a> 
-      </li> 
-      <li>
-
-          <a href="security.html">Kudu Security</a> 
-      </li> 
-      <li>
-
-          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
-      </li> 
-      <li>
-
-          <a href="background_tasks.html">Background Maintenance Tasks</a> 
-      </li> 
-      <li>
-
-          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
-      </li> 
-      <li>
-
-          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
-      </li> 
-      <li>
-
-          <a href="known_issues.html">Known Issues and Limitations</a> 
-      </li> 
-      <li>
-
-          <a href="contributing.html">Contributing to Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="export_control.html">Export Control Notice</a> 
-      </li> 
-  </ul>
-  </div>
-    </div>
-  </div>
-</div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/1fefa84c/docs/scaling_guide.html
----------------------------------------------------------------------
diff --git a/docs/scaling_guide.html b/docs/scaling_guide.html
deleted file mode 100644
index 24002c1..0000000
--- a/docs/scaling_guide.html
+++ /dev/null
@@ -1,455 +0,0 @@
----
-title: Apache Kudu Scaling Guide
-layout: default
-active_nav: docs
-last_updated: 'Last updated 2018-06-14 08:17:56 PDT'
----
-<!--
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-
-
-<div class="container">
-  <div class="row">
-    <div class="col-md-9">
-
-<h1>Apache Kudu Scaling Guide</h1>
-      <div id="preamble">
-<div class="sectionbody">
-<div class="paragraph">
-<p>This document describes in detail how Kudu scales with respect to various system resources,
-including memory, file descriptors, and threads. See the
-<a href="known_issues.html#_scale">scaling limits</a> for the maximum recommended parameters of a Kudu
-cluster. They can be used to estimate roughly the number of servers required for a given quantity
-of data.</p>
-</div>
-<div class="admonitionblock warning">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-warning" title="Warning"></i>
-</td>
-<td class="content">
-The recommendations and conclusions here are only approximations. Appropriate numbers
-depend on use case. There is no substitute for measurement and monitoring of resources used during a
-representative workload.
-</td>
-</tr>
-</table>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_terms"><a class="link" href="#_terms">Terms</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>We will use the following terms:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p><strong>hot replica</strong>: A tablet replica that is continuously receiving writes. For example, in a time
-series use case, tablet replicas for the most recent range partition on a time column would be
-continuously receiving the latest data, and would be hot replicas.</p>
-</li>
-<li>
-<p><strong>cold replica</strong>: A tablet replica that is not hot, i.e. a replica that is not frequently receiving
-writes, for example, once every few minutes. A cold replica may be read from. For example, in a time
-series use case, tablet replicas for previous range partitions on a time column would not receive
-writes at all, or only occasionally receive late updates or additions, but may be constantly read.</p>
-</li>
-<li>
-<p><strong>data on disk</strong>: The total amount of data stored on a tablet server across all disks,
-post-replication, post-compression, and post-encoding.</p>
-</li>
-</ul>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="_example_workload"><a class="link" href="#_example_workload">Example Workload</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>The sections below perform sample calculations using the following parameters:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>200 hot replicas per tablet server</p>
-</li>
-<li>
-<p>1600 cold replicas per tablet server</p>
-</li>
-<li>
-<p>8TB of data on disk per tablet server (about 4.5GB/replica)</p>
-</li>
-<li>
-<p>512MB block cache</p>
-</li>
-<li>
-<p>40 cores per server</p>
-</li>
-<li>
-<p>limit of 32000 file descriptors per server</p>
-</li>
-<li>
-<p>a read workload with 1 frequently-scanned table with 40 columns</p>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>This workload resembles a time series use case, where the hot replicas correspond to the most recent
-range partition on time.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="memory"><a class="link" href="#memory">Memory</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>The flag <code>--memory_limit_hard_bytes</code> determines the maximum amount of memory that a Kudu tablet
-server may use. The amount of memory used by a tablet server scales with data size, write workload,
-and read concurrency. The following table provides numbers that can be used to compute a rough
-estimate of memory usage.</p>
-</div>
-<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 1. Tablet Server Memory Usage</caption>
-<colgroup>
-<col style="width: 33.3333%;">
-<col style="width: 33.3333%;">
-<col style="width: 33.3334%;">
-</colgroup>
-<thead>
-<tr>
-<th class="tableblock halign-left valign-top">Type</th>
-<th class="tableblock halign-left valign-top">Multiplier</th>
-<th class="tableblock halign-left valign-top">Description</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Memory required per TB of data on disk</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">1.5GB per 1TB data on disk</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory per unit of data on disk required for
-basic operation of the tablet server.</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Hot Replicas' MemRowSets and DeltaMemStores</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">minimum 128MB per hot replica</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Minimum amount of data
-to flush per MemRowSet flush. For most use cases, updates should be rare compared to inserts, so the
-DeltaMemStores should be very small.</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Scans</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">256KB per column per core for read-heavy tables</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory used by scanners, and which
-will be constantly needed for tables which are constantly read.</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_cache_capacity_mb</code> (default 512MB)</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory reserved for use by the
-block cache.</p></td>
-</tr>
-</tbody>
-</table>
-<div class="paragraph">
-<p>Using this information for the example load gives the following breakdown of memory usage:</p>
-</div>
-<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 2. Example Tablet Server Memory Usage</caption>
-<colgroup>
-<col style="width: 50%;">
-<col style="width: 50%;">
-</colgroup>
-<thead>
-<tr>
-<th class="tableblock halign-left valign-top">Type</th>
-<th class="tableblock halign-left valign-top">Amount</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">8TB data on disk</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">8TB * 1.5GB / 1TB = 12GB</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">200 * 128MB = 25.6GB</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">1 40-column, frequently-scanned table</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">40 * 40 * 256KB = 409.6MB</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--block_cache_capacity_mb=512</code> = 512MB</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Expected memory usage</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">38.5GB</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Recommended hard limit</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">52GB</p></td>
-</tr>
-</tbody>
-</table>
-<div class="paragraph">
-<p>Using this as a rough estimate of Kudu&#8217;s memory usage, select a memory limit so that the expected
-memory usage of Kudu is around 50-75% of the hard limit.</p>
-</div>
-<div class="sect2">
-<h3 id="_verifying_if_a_memory_limit_is_sufficient"><a class="link" href="#_verifying_if_a_memory_limit_is_sufficient">Verifying if a Memory Limit is sufficient</a></h3>
-<div class="paragraph">
-<p>After configuring an appropriate memory limit with <code>--memory_limit_hard_bytes</code>, run a workload and
-monitor the Kudu tablet server process&#8217;s RAM usage. The memory usage should stay around 50-75% of
-the hard limit, with occasional spikes above 75% but below 100%. If the tablet server runs above 75%
-consistently, the memory limit should be increased.</p>
-</div>
-<div class="paragraph">
-<p>Additionally, it&#8217;s also useful to monitor the logs for memory rejections, which look like:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>Service unavailable: Soft memory limit exceeded (at 96.35% of capacity)</pre>
-</div>
-</div>
-<div class="paragraph">
-<p>and watch the memory rejections metrics:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p><code>leader_memory_pressure_rejections</code></p>
-</li>
-<li>
-<p><code>follower_memory_pressure_rejections</code></p>
-</li>
-<li>
-<p><code>transaction_memory_pressure_rejections</code></p>
-</li>
-</ul>
-</div>
-<div class="paragraph">
-<p>Occasional rejections due to memory pressure are fine and act as backpressure to clients. Clients
-will transparently retry operations. However, no operations should time out.</p>
-</div>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="file_descriptors"><a class="link" href="#file_descriptors">File Descriptors</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Processes are allotted a maximum number of open file descriptors (also referred to as fds). If a
-tablet server attempts to open too many fds, it will typically crash with a message saying something
-like "too many open files". The following table summarizes the sources of file descriptor usage in a
-Kudu tablet server process:</p>
-</div>
-<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 3. Tablet Server File Descriptor Usage</caption>
-<colgroup>
-<col style="width: 33.3333%;">
-<col style="width: 33.3333%;">
-<col style="width: 33.3334%;">
-</colgroup>
-<thead>
-<tr>
-<th class="tableblock halign-left valign-top">Type</th>
-<th class="tableblock halign-left valign-top">Multiplier</th>
-<th class="tableblock halign-left valign-top">Description</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">File cache</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_manager_max_open_files</code> (default 40% of process maximum)</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum allowed open fds reserved for use by
-the file cache.</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Hot replicas</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">2 per WAL segment, 1 per WAL index</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used by hot replicas. See below
-for more explanation.</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Cold replicas</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">3 per cold replica</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used per cold replica: 2 for the single WAL
-segment and 1 for the single WAL index.</p></td>
-</tr>
-</tbody>
-</table>
-<div class="paragraph">
-<p>Every replica has at least one WAL segment and at least one WAL index, and should have the same
-number of segments and indices; however, the number of segments and indices can be greater for a
-replica if one of its peer replicas is falling behind. WAL segment and index fds are closed as WALs
-are garbage collected.</p>
-</div>
-<div class="paragraph">
-<p>Using this information for the example load gives the following breakdown of file descriptor usage,
-under the assumption that some replicas are lagging and using 10 WAL segments:</p>
-</div>
-<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 4. Example Tablet Server File Descriptor Usage</caption>
-<colgroup>
-<col style="width: 50%;">
-<col style="width: 50%;">
-</colgroup>
-<thead>
-<tr>
-<th class="tableblock halign-left valign-top">Type</th>
-<th class="tableblock halign-left valign-top">Amount</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">file cache</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">40% * 32000 fds = 12800 fds</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas * 3 fds / cold replica = 4800 fds</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">(2 / segment * 10 segments/hot replica * 200 hot replicas) + (1 / index * 10 indices / hot replica * 200 hot replicas) = 6000 fds</p></td>
-</tr>
-<tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Total</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">23600 fds</p></td>
-</tr>
-</tbody>
-</table>
-<div class="paragraph">
-<p>So for this example, the tablet server process has about 32000 - 23600 = 8400 fds to spare.</p>
-</div>
-<div class="paragraph">
-<p>There is typically no downside to configuring a higher file descriptor limit if approaching the
-currently configured limit.</p>
-</div>
-</div>
-</div>
-<div class="sect1">
-<h2 id="threads"><a class="link" href="#threads">Threads</a></h2>
-<div class="sectionbody">
-<div class="paragraph">
-<p>Processes are allotted a maximum number of threads by the operating system, and this limit is
-typically difficult or impossible to change. Therefore, this section is more informational than
-advisory.</p>
-</div>
-<div class="paragraph">
-<p>If a Kudu tablet server&#8217;s thread count exceeds the OS limit, it will crash, usually with a message
-in the logs like "pthread_create failed: Resource temporarily unavailable". If the system thread
-count limit is exceeded, other processes on the same node may also crash.</p>
-</div>
-<div class="paragraph">
-<p>Threads and threadpools are used all over Kudu for various purposes, but the number of threads found
-in nearly all of these does not scale with load or data/tablet size; instead, the number of threads
-is either a hardcoded constant, a constant defined by a configuration parameter, or based on a
-static dimension (such as the number of CPU cores).</p>
-</div>
-<div class="paragraph">
-<p>The only exception to this is the WAL append thread, one of which exists for every "hot" replica.
-Note that all replicas may be considered hot at startup, so tablet servers' thread usage will
-generally peak when started and settle down thereafter.</p>
-</div>
-</div>
-</div>
-    </div>
-    <div class="col-md-3">
-
-  <div id="toc" data-spy="affix" data-offset-top="70">
-  <ul>
-
-      <li>
-
-          <a href="index.html">Introducing Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="release_notes.html">Kudu Release Notes</a> 
-      </li> 
-      <li>
-
-          <a href="quickstart.html">Getting Started with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="installation.html">Installation Guide</a> 
-      </li> 
-      <li>
-
-          <a href="configuration.html">Configuring Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="administration.html">Administering Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="developing.html">Developing Applications with Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="schema_design.html">Kudu Schema Design</a> 
-      </li> 
-      <li>
-
-          <a href="security.html">Kudu Security</a> 
-      </li> 
-      <li>
-
-          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
-      </li> 
-      <li>
-
-          <a href="background_tasks.html">Background Maintenance Tasks</a> 
-      </li> 
-      <li>
-
-          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
-      </li> 
-      <li>
-
-          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
-      </li> 
-      <li>
-
-          <a href="known_issues.html">Known Issues and Limitations</a> 
-      </li> 
-      <li>
-
-          <a href="contributing.html">Contributing to Kudu</a> 
-      </li> 
-      <li>
-
-          <a href="export_control.html">Export Control Notice</a> 
-      </li> 
-  </ul>
-  </div>
-    </div>
-  </div>
-</div>
\ No newline at end of file


Mime
View raw message