accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [2/2] accumulo-website git commit: Jekyll build from master:ddd5b72
Date Mon, 07 Aug 2017 21:42:12 GMT
Jekyll build from master:ddd5b72

ACCUMULO-4684 Add replication table schema to docs


Branch: refs/heads/asf-site
Commit: fcce417afae30a99dae2295d3cb8c6fc6c6e6873
Parents: c935469
Author: Josh Elser <>
Authored: Mon Aug 7 17:41:34 2017 -0400
Committer: Josh Elser <>
Committed: Mon Aug 7 17:41:34 2017 -0400

 docs/unreleased/administration/replication.html | 52 ++++++++++++++++++++
 feed.xml                                        |  4 +-
 2 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/docs/unreleased/administration/replication.html b/docs/unreleased/administration/replication.html
index 4aca63b..63dec5f 100644
--- a/docs/unreleased/administration/replication.html
+++ b/docs/unreleased/administration/replication.html
@@ -760,6 +760,58 @@ Accumulo instance, it is trivial to copy those files to a new HDFS instance
 instance using the same process. Hadoop’s <code class="highlighter-rouge">distcp</code>
command provides an easy way to copy large amounts of data to another
 HDFS instance which makes the problem of duplicating bulk imports very easy to solve.</p>
+<h2 id="table-schema">Table Schema</h2>
+<p>The following describes the kinds of keys, their format, and their general function
for the purposes of individuals
+understanding what the replication table describes. Because the replication table is essentially
a state machine,
+this data is often the source of truth for why Accumulo is doing what it is with respect
to replication. There are
+three “sections” in this table: “repl”, “work”, and “order”.</p>
+<h3 id="repl-section">Repl section</h3>
+<p>This section is for the tracking of a WAL file that needs to be replicated to one
or more Accumulo remote tables.
+This entry is tracking that replication needs to happen on the given WAL file, but also that
the local Accumulo table,
+as specified by the column qualifier “local table ID”, has information in this WAL file.</p>
+<p>The structure of the key-value is as follows:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;HDFS_uri_to_WAL&gt;
repl:&lt;local_table_id&gt; [] -&gt; &lt;protobuf&gt;
+<p>This entry is created based on a replication entry from the Accumlo metadata table,
and is deleted from the replication table
+when the WAL has been fully replicated to all remote Accumulo tables.</p>
+<h3 id="work-section">Work section</h3>
+<p>This section is for the tracking of a WAL file that needs to be replicated to a
single Accumulo table in a remote
+Accumulo cluster. If a WAL must be replicated to multiple tables, there will be multiple
entries. The Value for this
+Key is a serialized ProtocolBuffer message which encapsulates the portion of the WAL which
was already sent for
+this file. The “replication target” is the unique location of where the file needs to
be replicated: the identifier
+for the remote Accumulo cluster and the table ID in that remote Accumulo cluster. The protocol
buffer in the value
+tracks the progress of replication to the remote cluster.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;HDFS_uri_to_WAL&gt;
work:&lt;replication_target&gt; [] -&gt; &lt;protobuf&gt;
+<p>The “work” entry is created when a WAL has an “order” entry, and deleted
after the WAL is replicated to all
+necessary remote clusters.</p>
+<h3 id="order-section">Order section</h3>
+<p>This section is used to order and schedule (create) replication work. In some cases,
data with the same timestamp
+may be provided multiple times. In this case, it is important that WALs are replicated in
the same order they were
+created/used. In this case (and in cases where this is not important), the order entry ensures
that oldest WALs
+are processed most quickly and pushed through the replication framework.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;time_of_WAL_closing&gt;\x00&lt;HDFS_uri_to_WAL&gt;
order:&lt;local_table_id&gt; [] -&gt; &lt;protobuf&gt;
+<p>The “order” entry is created when the WAL is closed (no longer being written
to) and is removed when
+the WAL is fully replicated to all remote locations.</p>
     <div class="row" style="margin-top: 20px;">
       <div class="col-md-10"><strong>Find documentation for all releases in the
<a href="/docs-archive">archive</strong></div>
diff --git a/feed.xml b/feed.xml
index aa6eac7..83a433d 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
     <atom:link href="" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 02 Aug 2017 13:44:25 -0400</pubDate>
-    <lastBuildDate>Wed, 02 Aug 2017 13:44:25 -0400</lastBuildDate>
+    <pubDate>Mon, 07 Aug 2017 17:41:16 -0400</pubDate>
+    <lastBuildDate>Mon, 07 Aug 2017 17:41:16 -0400</lastBuildDate>
     <generator>Jekyll v3.3.1</generator>

View raw message