accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mwa...@apache.org
Subject [accumulo-website] branch master updated: Improved design documentation of tablet server (#49)
Date Mon, 18 Dec 2017 20:31:56 GMT
This is an automated email from the ASF dual-hosted git repository.

mwalch pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 8da2b18  Improved design documentation of tablet server (#49)
8da2b18 is described below

commit 8da2b1823e704ef8e0cc251142806e6e9a94dca6
Author: Mike Walch <mwalch@apache.org>
AuthorDate: Mon Dec 18 15:31:55 2017 -0500

    Improved design documentation of tablet server (#49)
---
 _docs-2-0/administration/caching.md |  25 ++++++++++++++-----------
 _docs-2-0/getting-started/design.md |  17 +++++++++++------
 images/docs/tablet_server.png       | Bin 0 -> 51783 bytes
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/_docs-2-0/administration/caching.md b/_docs-2-0/administration/caching.md
index 7c5ce52..bdd591a 100644
--- a/_docs-2-0/administration/caching.md
+++ b/_docs-2-0/administration/caching.md
@@ -4,26 +4,27 @@ category: administration
 order: 11
 ---
 
-Accumulo tablet servers have a **block cache** that buffers data in memory to limit reads
from disk.
+Accumulo [tablet servers][tserver] have **block caches** that buffer data in memory to limit
reads from disk.
 This caching has the following benefits:
 
 * reduces latency when reading data
 * helps alleviate hotspots in tables
 
-The block cache stores index and data blocks. A typical Accumulo read operation will perform
a binary search
-over several index blocks followed by a linear scan of one or more data blocks. Each tablet
server
-has its own block cache that is shared by all hosted tablets. Therefore, block caches are
only enabled
+Each tablet server has an index and data block cache that is shared by all hosted tablets
(see the [tablet server diagram][tserver]
+to learn more). A typical Accumulo read operation will perform a binary search over several
index blocks followed by a linear scan
+of one or more data blocks. If these blocks are not in a cache, they will need to be retrieved
from [RFiles] in HDFS. While the index
+block cache is enabled for all tables, the data block cache has to be enabled for a table
by the user. It is typically only enabled
 for tables where read performance is critical.
 
 ## Configuration
 
-While the block cache is enabled by default for the Accumulo metadata tables, it must be
enabled
-for all other tables by setting the following table properties to `true`:
+The index and data block caches are configured for tables by the following properties:
 
-* [table.cache.block.enable] - enables data block cache on the table
-* [table.cache.index.enable] - enables index block cache on the table
+* [table.cache.block.enable] - enables data block cache on the table (default is `false`)
+* [table.cache.index.enable] - enables index block cache on the table (default is `true`)
 
-These properties can be set in the Accumulo shell using the following command:
+While the index block cache is enabled by default for all Accumulo tables, users must enable
the data block cache by
+settting [table.cache.block.enable] to `true` in the shell:
 
     config -t mytable -s table.cache.block.enable=true
 
@@ -33,12 +34,14 @@ Or programatically using [TableOperations.setProperty()][tableops]:
 conn.tableOperations().setProperty("mytable", "table.cache.block.enable", "true");
 ```
 
-The sizes of the index and data block caches can be changed from their defaults by setting
-the following properties:
+The size of the index and data block caches (which are shared by all tablets of tablet server)
can be changed from
+their defaults by setting the following properties:
 
 * [tserver.cache.data.size]
 * [tserver.cache.index.size]
 
+[tserver]: {{ page.docs_baseurl }}/getting-started/design#tablet-server-1
+[RFiles]: {{ page.docs_baseurl}}/getting-started/design#rfile
 [table.cache.block.enable]: {{ page.docs_baseurl }}/administration/properties#table_cache_block_enable
 [table.cache.index.enable]: {{ page.docs_baseurl }}/administration/properties#table_cache_index_enable
 [tserver.cache.data.size]: {{ page.docs_baseurl }}/administration/properties#tserver_cache_data_size
diff --git a/_docs-2-0/getting-started/design.md b/_docs-2-0/getting-started/design.md
index 26e9048..7f6a880 100644
--- a/_docs-2-0/getting-started/design.md
+++ b/_docs-2-0/getting-started/design.md
@@ -36,7 +36,7 @@ one Master server and many Clients.
 The TabletServer manages some subset of all the tablets (partitions of tables). This includes
receiving writes from clients, persisting writes to a
 write-ahead log, sorting new key-value pairs in memory, periodically
 flushing sorted key-value pairs to new files in HDFS, and responding
-to reads from clients, forming a merge-sorted view of all keys and
+to reads from clients, forming a sorted merge view of all keys and
 values from all the files it has created and the sorted in-memory
 store.
 
@@ -102,7 +102,7 @@ ingest and query load is balanced across the cluster.
 
 ![data distribution]({{ site.url }}/images/docs/data_distribution.png)
 
-## Tablet Service
+## Tablet Server
 
 When a write arrives at a TabletServer it is written to a Write-Ahead Log and
 then inserted into a sorted data structure in memory called a MemTable. When the
@@ -112,10 +112,14 @@ called a minor compaction. A new MemTable is then created and the fact
of the
 compaction is recorded in the Write-Ahead Log.
 
 When a request to read data arrives at a TabletServer, the TabletServer does a
-binary search across the MemTable as well as the in-memory indexes associated
-with each RFile to find the relevant values. If clients are performing a scan,
-several key-value pairs are returned to the client in order from the MemTable
-and the set of RFiles by performing a merge-sort as they are read.
+binary search across the MemTable as well as the index blocks associated with each RFile
+to find the relevant values. If clients are performing a scan, several key-value pairs
+are returned to the client in order from the MemTable and data blocks of RFiles by performing
+a sorted merge as they are read. If [caching] is enabled for the table, any index or data
+block is stored in the block cache to speed up future scans.
+
+![tablet server diagram]({{ site.url }}/images/docs/tablet_server.png)
+<!-- Source at https://docs.google.com/presentation/d/1yEBNM044FxrzksVfxU35WDbxcVWUYUMy3tgRP75dzus/edit?usp=sharing
-->
 
 ## RFile
 
@@ -178,3 +182,4 @@ TabletServer failures are noted on the Master's monitor page, accessible
via
 [clients]: {{page.docs_baseurl}}/getting-started/clients
 [merging]: {{page.docs_baseurl}}/getting-started/table_configuration#merging-tablets
 [compaction]: {{page.docs_baseurl}}/getting-started/table_configuration#compaction
+[caching]: {{page.docs_baseurl}}/administration/caching
diff --git a/images/docs/tablet_server.png b/images/docs/tablet_server.png
new file mode 100644
index 0000000..2581dd0
Binary files /dev/null and b/images/docs/tablet_server.png differ

-- 
To stop receiving notification emails like this one, please contact
['"commits@accumulo.apache.org" <commits@accumulo.apache.org>'].

Mime
View raw message