kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject incubator-kudu git commit: KUDU-1385 Part 1: Modify documentation landing page
Date Thu, 23 Jun 2016 20:54:50 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/branch-0.9.x 7af66cbe9 -> 095b481e3


KUDU-1385 Part 1: Modify documentation landing page

Rename introduction.html to index.html and get rid
of old index.html  completely.

Change-Id: Icb4b5b5ba4851d9a7974b413f8637089a157973a
Reviewed-on: http://gerrit.cloudera.org:8080/2675
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <todd@apache.org>
(cherry picked from commit 738e99384d4fa3c153922f2debeac3b5772e4c3e)


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/095b481e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/095b481e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/095b481e

Branch: refs/heads/branch-0.9.x
Commit: 095b481e308ad954cfe36ff7f91751f43eaf6aa1
Parents: 7af66cb
Author: Misty Stanley-Jones <mstanleyjones@cloudera.com>
Authored: Wed Mar 30 19:57:25 2016 -0700
Committer: Todd Lipcon <todd@apache.org>
Committed: Thu Jun 23 13:53:47 2016 -0700

----------------------------------------------------------------------
 docs/index.adoc        | 234 +++++++++++++++++++++++++++++++++++---------
 docs/introduction.adoc | 220 -----------------------------------------
 docs/style_guide.adoc  |   2 +-
 3 files changed, 189 insertions(+), 267 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/095b481e/docs/index.adoc
----------------------------------------------------------------------
diff --git a/docs/index.adoc b/docs/index.adoc
index 554def1..20a5b08 100644
--- a/docs/index.adoc
+++ b/docs/index.adoc
@@ -15,64 +15,206 @@
 // specific language governing permissions and limitations
 // under the License.
 
-= Apache Kudu (incubating) Documentation
-
-// License Header Here //
+[[introduction]]
+= Introducing Apache Kudu (incubating)
 :author: Kudu Team
 :imagesdir: ./images
 :icons: font
+:toc: left
+:toclevels: 3
 :doctype: book
 :backend: html5
 :sectlinks:
 :experimental:
 
-++++
-<div class="landing_page">
-++++
-
-link:introduction.html[Introducing Kudu]::
-  Get familiar with what sets Kudu apart.
-
-link:release_notes.html[Kudu Beta Release Notes]::
-  Find out what to expect in Kudu public beta releases, as well as known issues, workarounds,
-  and limitations.
-
-link:quickstart.html[Getting Started With Kudu]::
-  Deploy a simple proof-of-concept Kudu cluster to try it out for yourself.
-
-link:installation.html[Installation Guide]::
-  Read about all the different options for installing Kudu.
-
-link:configuration.html[Configuring Kudu]::
-  Find out how to customize your Kudu cluster.
-
-link:kudu_impala_integration.html[Using Kudu with Apache Impala (incubating)]::
-  Learn about using Impala to create, query, and update your Kudu tables.
-
-link:administration.html[Administering Kudu]::
-  Keep Kudu running smoothly.
-
-link:troubleshooting.html[Troubleshooting Kudu]::
-  Find guidelines for solving problems with your Kudu cluster.
+Kudu is a columnar storage manager developed for the Hadoop platform.  Kudu shares
+the common technical properties of Hadoop ecosystem applications: it runs on commodity
+hardware, is horizontally scalable, and supports highly available operation.
+
+Kudu's design sets it apart. Some of Kudu's benefits include:
+
+- Fast processing of OLAP workloads.
+- Integration with MapReduce, Spark and other Hadoop ecosystem components.
+- Tight integration with Cloudera Impala, making it a good, mutable alternative
+  to using HDFS with Parquet.
+- Strong but flexible consistency model, allowing you to choose consistency
+  requirements on a per-request basis, including the option for strict-serializable consistency.
+- Strong performance for running sequential and random workloads simultaneously.
+- Easy to administer and manage with Cloudera Manager.
+- High availability. Tablet Servers and Masters use the <<raft>>, which ensures
that
+  as long as more than half the total number of replicas is available, the tablet is available
for
+  reads and writes. For instance, if 2 out of 3 replicas or 3 out of 5 replicas are available,
the tablet
+  is available.
++
+Reads can be serviced by read-only follower tablets, even in the event of a
+leader tablet failure.
+- Structured data model.
+
+By combining all of these properties, Kudu targets support for families of
+applications that are difficult or impossible to implement on current generation
+Hadoop storage technologies. A few examples of applications for which Kudu is a great
+solution are:
+
+* Reporting applications where newly-arrived data needs to be immediately available for end
users
+* Time-series applications that must simultaneously support:
+  - queries across large amounts of historic data
+  - granular queries about an individual entity that must return very quickly
+* Applications that use predictive models to make real-time decisions with periodic
+refreshes of the predictive model based on all historic data
+
+For more information about these and other scenarios, see <<kudu_use_cases>>.
 
-link:developing.html[Developing Applications With Kudu]::
-  Get information about developing with the Kudu APIs and links to working example code.
+== Concepts and Terms
+[[kudu_columnar_data_store]]
+.Columnar Data Store
 
-link:schema_design.html[Kudu Schema Design]::
-  Learn about designing Kudu table schemas.
+Kudu is a _columnar data store_. A columnar data store stores data in strongly-typed
+columns. With a proper design, it is superior for analytical or data warehousing
+workloads for several reasons.
 
-link:transaction_semantics.html[Kudu Transaction Semantics]::
-  Information about transaction semantics in Kudu.
+Read Efficiency:: For analytical queries, you can read a single column, or a portion
+of that column, while ignoring other columns. This means you can fulfill your query
+while reading a minimal number of blocks on disk. With a row-based store, you need
+to read the entire row, even if you only return values from a few columns.
 
-link:contributing.html[Contributing to Kudu]::
-  Get involved in the Kudu community.
+Data Compression:: Because a given column contains only one type of data, pattern-based
+compression can be orders of magnitude more efficient than compressing mixed data
+types. Combined with the efficiencies of reading data from columns,  compression allows
+you to fulfill your query while reading even fewer blocks from disk. See
+link:schema_design.html#encoding[Data Compression]
+
+.Table
+
+A _table_ is where your data is stored in Kudu. A table has a schema and
+a totally ordered primary key. A table is split into segments called tablets.
+
+.Tablet
+
+A _tablet_ is a contiguous segment of a table. A given tablet is
+replicated on multiple tablet servers, and one of these replicas is considered
+the leader tablet. Any replica can service reads, and writes require consensus
+among the set of tablet servers serving the tablet.
+
+.Tablet Server
+
+A _tablet server_ stores and serves tablets to clients. For a
+given tablet, one tablet server serves the lead tablet, and the others serve
+follower replicas of that tablet. Only leaders service write requests, while
+leaders or followers each service read requests. Leaders are elected using
+<<raft>>. One tablet server can serve multiple tablets, and one tablet can be
served
+by multiple tablet servers.
+
+.Master
+
+The _master_ keeps track of all the tablets, tablet servers, the
+<<catalog_table>>, and other metadata related to the cluster. At a given point
+in time, there can only be one acting master (the leader). If the current leader
+disappears, a new master is elected using <<raft>>.
+
+The master also coordinates metadata operations for clients. For example, when
+creating a new table, the client internally sends an RPC to the master. The
+master writes the metadata for the new table into the catalog table, and
+coordinates the process of creating tablets on the tablet servers.
+
+All the master's data is stored in a tablet, which can be replicated to all the
+other candidate masters.
+
+Tablet servers heartbeat to the master at a set interval (the default is once
+per second).
+
+[[raft]]
+.Raft Consensus Algorithm
+
+Kudu uses the link:https://raft.github.io/[Raft consensus algorithm] as
+a means to guarantee fault-tolerance and consistency, both for regular tablets and for master
+data. Through Raft, multiple replicas of a tablet elect a _leader_, which is responsible
+for accepting and replicating writes to _follower_ replicas. Once a write is persisted
+in a majority of replicas it is acknowledged to the client. A given group of `N` replicas
+(usually 3 or 5) is able to accept writes with at most `(N - 1)/2` faulty replicas.
+
+[[catalog_table]]
+.Catalog Table
+
+The _catalog table_ is the central location for
+metadata of Kudu. It stores information about tables and tablets. The catalog
+table is accessible to clients via the master, using the client API.
+
+Tables:: table schemas, locations, and states
+
+Tablets:: the list of existing tablets, which tablet servers have replicas of
+each tablet, the tablet's current state, and start and end keys.
+
+.Logical Replication
+
+Kudu replicates operations, not on-disk data. This is referred to as _logical
+replication_, as opposed to _physical replication_. Physical operations, such as
+compaction, do not need to transmit the data over the network. This results in a
+substantial reduction in network traffic for heavy write scenarios.
+
+== Architectural Overview
+
+The following diagram shows a Kudu cluster with three masters and multiple tablet
+servers, each serving multiple tablets. It illustrates how Raft consensus is used
+to allow for both leaders and followers for both the masters and tablet servers. In
+addition, a tablet server can be a leader for some tablets, and a follower for others.
+Leaders are shown in gold, while followers are shown in blue.
+
+NOTE: Multiple masters are not supported during the Kudu beta period.
 
-link:style_guide.html[Kudu Documentation Style Guide]::
-  Get familiar with the guidelines for documentation contributions to the Kudu project.
+image::kudu-architecture-2.png[Kudu Architecture, 800]
+
+[[kudu_use_cases]]
+== Example Use Cases
+.Streaming Input with Near Real Time Availability
+
+A common challenge in data analysis is one where new data arrives rapidly and constantly,
+and the same data needs to be available in near real time for reads, scans, and
+updates. Kudu offers the powerful combination of fast inserts and updates with
+efficient columnar scans to enable real-time analytics use cases on a single storage layer.
 
-link:configuration_reference.html[Kudu Configuration Reference]::
-  Find out about individual Kudu configuration options.
+.Time-series application with widely varying access patterns
+
+A time-series schema is one in which data points are organized and keyed according
+to the time at which they occurred. This can be useful for investigating the
+performance of metrics over time or attempting to predict future behavior based
+on past data. For instance, time-series customer data might be used both to store
+purchase click-stream history and to predict future purchases, or for use by a
+customer support representative. While these different types of analysis are occurring,
+inserts and mutations may also be occurring individually and in bulk, and become available
+immediately to read workloads. Kudu can handle all of these access patterns
+simultaneously in a scalable and efficient manner.
 
-++++
-</div>
-++++
+Kudu is a good fit for time-series workloads for several reasons. With Kudu's support for
+hash-based partitioning, combined with its native support for compound row keys, it is
+simple to set up a table spread across many servers without the risk of "hotspotting"
+that is commonly observed when range partitioning is used. Kudu's columnar storage engine
+is also beneficial in this context, because many time-series workloads read only a few columns,
+as opposed to the whole row.
+
+In the past, you might have needed to use multiple data stores to handle different
+data access patterns. This practice adds complexity to your application and operations, and
+duplicates storage. Kudu can handle all of these access patterns natively and efficiently,
+without the need to off-load work to other data stores.
+
+.Predictive Modeling
+
+Data analysts often develop predictive learning models from large sets of data. The
+model and the data may need to be updated or modified often as the learning takes
+place or as the situation being modeled changes. In addition, the scientist may want
+to change one or more factors in the model to see what happens over time. Updating
+a large set of data stored in files in HDFS is resource-intensive, as each file needs
+to be completely rewritten. In Kudu, updates happen in near real time. The scientist
+can tweak the value, re-run the query, and refresh the graph in seconds or minutes,
+rather than hours or days. In addition, batch or incremental algorithms can be run
+across the data at any time, with near-real-time results.
+
+.Combining Data In Kudu With Legacy Systems
+
+Companies generate data from multiple sources and store it in a variety of systems
+and formats. For instance, some of your data may be stored in Kudu, some in a traditional
+RDBMS, and some in files in HDFS. You can access and query all of these sources and
+formats using Impala, without the need to change your legacy systems.
+
+== Next Steps
+- link:quickstart.html[Get Started With Kudu]
+- link:installation.html[Installing Kudu]

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/095b481e/docs/introduction.adoc
----------------------------------------------------------------------
diff --git a/docs/introduction.adoc b/docs/introduction.adoc
deleted file mode 100644
index 20a5b08..0000000
--- a/docs/introduction.adoc
+++ /dev/null
@@ -1,220 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-[[introduction]]
-= Introducing Apache Kudu (incubating)
-:author: Kudu Team
-:imagesdir: ./images
-:icons: font
-:toc: left
-:toclevels: 3
-:doctype: book
-:backend: html5
-:sectlinks:
-:experimental:
-
-Kudu is a columnar storage manager developed for the Hadoop platform.  Kudu shares
-the common technical properties of Hadoop ecosystem applications: it runs on commodity
-hardware, is horizontally scalable, and supports highly available operation.
-
-Kudu's design sets it apart. Some of Kudu's benefits include:
-
-- Fast processing of OLAP workloads.
-- Integration with MapReduce, Spark and other Hadoop ecosystem components.
-- Tight integration with Cloudera Impala, making it a good, mutable alternative
-  to using HDFS with Parquet.
-- Strong but flexible consistency model, allowing you to choose consistency
-  requirements on a per-request basis, including the option for strict-serializable consistency.
-- Strong performance for running sequential and random workloads simultaneously.
-- Easy to administer and manage with Cloudera Manager.
-- High availability. Tablet Servers and Masters use the <<raft>>, which ensures
that
-  as long as more than half the total number of replicas is available, the tablet is available
for
-  reads and writes. For instance, if 2 out of 3 replicas or 3 out of 5 replicas are available,
the tablet
-  is available.
-+
-Reads can be serviced by read-only follower tablets, even in the event of a
-leader tablet failure.
-- Structured data model.
-
-By combining all of these properties, Kudu targets support for families of
-applications that are difficult or impossible to implement on current generation
-Hadoop storage technologies. A few examples of applications for which Kudu is a great
-solution are:
-
-* Reporting applications where newly-arrived data needs to be immediately available for end
users
-* Time-series applications that must simultaneously support:
-  - queries across large amounts of historic data
-  - granular queries about an individual entity that must return very quickly
-* Applications that use predictive models to make real-time decisions with periodic
-refreshes of the predictive model based on all historic data
-
-For more information about these and other scenarios, see <<kudu_use_cases>>.
-
-== Concepts and Terms
-[[kudu_columnar_data_store]]
-.Columnar Data Store
-
-Kudu is a _columnar data store_. A columnar data store stores data in strongly-typed
-columns. With a proper design, it is superior for analytical or data warehousing
-workloads for several reasons.
-
-Read Efficiency:: For analytical queries, you can read a single column, or a portion
-of that column, while ignoring other columns. This means you can fulfill your query
-while reading a minimal number of blocks on disk. With a row-based store, you need
-to read the entire row, even if you only return values from a few columns.
-
-Data Compression:: Because a given column contains only one type of data, pattern-based
-compression can be orders of magnitude more efficient than compressing mixed data
-types. Combined with the efficiencies of reading data from columns,  compression allows
-you to fulfill your query while reading even fewer blocks from disk. See
-link:schema_design.html#encoding[Data Compression]
-
-.Table
-
-A _table_ is where your data is stored in Kudu. A table has a schema and
-a totally ordered primary key. A table is split into segments called tablets.
-
-.Tablet
-
-A _tablet_ is a contiguous segment of a table. A given tablet is
-replicated on multiple tablet servers, and one of these replicas is considered
-the leader tablet. Any replica can service reads, and writes require consensus
-among the set of tablet servers serving the tablet.
-
-.Tablet Server
-
-A _tablet server_ stores and serves tablets to clients. For a
-given tablet, one tablet server serves the lead tablet, and the others serve
-follower replicas of that tablet. Only leaders service write requests, while
-leaders or followers each service read requests. Leaders are elected using
-<<raft>>. One tablet server can serve multiple tablets, and one tablet can be
served
-by multiple tablet servers.
-
-.Master
-
-The _master_ keeps track of all the tablets, tablet servers, the
-<<catalog_table>>, and other metadata related to the cluster. At a given point
-in time, there can only be one acting master (the leader). If the current leader
-disappears, a new master is elected using <<raft>>.
-
-The master also coordinates metadata operations for clients. For example, when
-creating a new table, the client internally sends an RPC to the master. The
-master writes the metadata for the new table into the catalog table, and
-coordinates the process of creating tablets on the tablet servers.
-
-All the master's data is stored in a tablet, which can be replicated to all the
-other candidate masters.
-
-Tablet servers heartbeat to the master at a set interval (the default is once
-per second).
-
-[[raft]]
-.Raft Consensus Algorithm
-
-Kudu uses the link:https://raft.github.io/[Raft consensus algorithm] as
-a means to guarantee fault-tolerance and consistency, both for regular tablets and for master
-data. Through Raft, multiple replicas of a tablet elect a _leader_, which is responsible
-for accepting and replicating writes to _follower_ replicas. Once a write is persisted
-in a majority of replicas it is acknowledged to the client. A given group of `N` replicas
-(usually 3 or 5) is able to accept writes with at most `(N - 1)/2` faulty replicas.
-
-[[catalog_table]]
-.Catalog Table
-
-The _catalog table_ is the central location for
-metadata of Kudu. It stores information about tables and tablets. The catalog
-table is accessible to clients via the master, using the client API.
-
-Tables:: table schemas, locations, and states
-
-Tablets:: the list of existing tablets, which tablet servers have replicas of
-each tablet, the tablet's current state, and start and end keys.
-
-.Logical Replication
-
-Kudu replicates operations, not on-disk data. This is referred to as _logical
-replication_, as opposed to _physical replication_. Physical operations, such as
-compaction, do not need to transmit the data over the network. This results in a
-substantial reduction in network traffic for heavy write scenarios.
-
-== Architectural Overview
-
-The following diagram shows a Kudu cluster with three masters and multiple tablet
-servers, each serving multiple tablets. It illustrates how Raft consensus is used
-to allow for both leaders and followers for both the masters and tablet servers. In
-addition, a tablet server can be a leader for some tablets, and a follower for others.
-Leaders are shown in gold, while followers are shown in blue.
-
-NOTE: Multiple masters are not supported during the Kudu beta period.
-
-image::kudu-architecture-2.png[Kudu Architecture, 800]
-
-[[kudu_use_cases]]
-== Example Use Cases
-.Streaming Input with Near Real Time Availability
-
-A common challenge in data analysis is one where new data arrives rapidly and constantly,
-and the same data needs to be available in near real time for reads, scans, and
-updates. Kudu offers the powerful combination of fast inserts and updates with
-efficient columnar scans to enable real-time analytics use cases on a single storage layer.
-
-.Time-series application with widely varying access patterns
-
-A time-series schema is one in which data points are organized and keyed according
-to the time at which they occurred. This can be useful for investigating the
-performance of metrics over time or attempting to predict future behavior based
-on past data. For instance, time-series customer data might be used both to store
-purchase click-stream history and to predict future purchases, or for use by a
-customer support representative. While these different types of analysis are occurring,
-inserts and mutations may also be occurring individually and in bulk, and become available
-immediately to read workloads. Kudu can handle all of these access patterns
-simultaneously in a scalable and efficient manner.
-
-Kudu is a good fit for time-series workloads for several reasons. With Kudu's support for
-hash-based partitioning, combined with its native support for compound row keys, it is
-simple to set up a table spread across many servers without the risk of "hotspotting"
-that is commonly observed when range partitioning is used. Kudu's columnar storage engine
-is also beneficial in this context, because many time-series workloads read only a few columns,
-as opposed to the whole row.
-
-In the past, you might have needed to use multiple data stores to handle different
-data access patterns. This practice adds complexity to your application and operations, and
-duplicates storage. Kudu can handle all of these access patterns natively and efficiently,
-without the need to off-load work to other data stores.
-
-.Predictive Modeling
-
-Data analysts often develop predictive learning models from large sets of data. The
-model and the data may need to be updated or modified often as the learning takes
-place or as the situation being modeled changes. In addition, the scientist may want
-to change one or more factors in the model to see what happens over time. Updating
-a large set of data stored in files in HDFS is resource-intensive, as each file needs
-to be completely rewritten. In Kudu, updates happen in near real time. The scientist
-can tweak the value, re-run the query, and refresh the graph in seconds or minutes,
-rather than hours or days. In addition, batch or incremental algorithms can be run
-across the data at any time, with near-real-time results.
-
-.Combining Data In Kudu With Legacy Systems
-
-Companies generate data from multiple sources and store it in a variety of systems
-and formats. For instance, some of your data may be stored in Kudu, some in a traditional
-RDBMS, and some in files in HDFS. You can access and query all of these sources and
-formats using Impala, without the need to change your legacy systems.
-
-== Next Steps
-- link:quickstart.html[Get Started With Kudu]
-- link:installation.html[Installing Kudu]

http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/095b481e/docs/style_guide.adoc
----------------------------------------------------------------------
diff --git a/docs/style_guide.adoc b/docs/style_guide.adoc
index bab3842..d7b34ab 100644
--- a/docs/style_guide.adoc
+++ b/docs/style_guide.adoc
@@ -68,7 +68,7 @@ during a local build.
 
 To view the HTML, open _docs/index.html_ in your local browser.
 
-You can also build only a single chapter. such as _introduction.adoc_, by passing its name
instead.
+You can also build only a single chapter. such as _release_notes.adoc_, by passing its name
instead.
 
 == Asciidoc Style Guide
 Asciidoc supports a lot of syntax that we do not need to use. When possible, stick


Mime
View raw message