From commits-return-21985-archive-asf-public=cust-asf.ponee.io@accumulo.apache.org Tue Jul 17 16:18:50 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6CA56180600 for ; Tue, 17 Jul 2018 16:18:49 +0200 (CEST) Received: (qmail 46930 invoked by uid 500); 17 Jul 2018 14:18:43 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 46921 invoked by uid 99); 17 Jul 2018 14:18:43 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jul 2018 14:18:43 +0000 Received: by gitbox.apache.org (ASF Mail Server at gitbox.apache.org, from userid 33) id EF25080981; Tue, 17 Jul 2018 14:18:42 +0000 (UTC) Date: Tue, 17 Jul 2018 14:18:42 +0000 To: "commits@accumulo.apache.org" Subject: [accumulo-website] branch asf-site updated: Jekyll build from master:1fbf0a9 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-ID: <153183712289.5537.14038581790178868203@gitbox.apache.org> From: mwalch@apache.org X-Git-Host: gitbox.apache.org X-Git-Repo: accumulo-website X-Git-Refname: refs/heads/asf-site X-Git-Reftype: branch X-Git-Oldrev: e8b64f28876912ecbc23a786aa200f2217acf89c X-Git-Newrev: d57a43300042142115d1bd2c1e26ebab95ea1359 X-Git-Rev: d57a43300042142115d1bd2c1e26ebab95ea1359 X-Git-NotificationType: ref_changed_plus_diff X-Git-Multimail-Version: 1.5.dev Auto-Submitted: auto-generated This is an automated email from the ASF dual-hosted git repository. mwalch pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/accumulo-website.git The following commit(s) were added to refs/heads/asf-site by this push: new d57a433 Jekyll build from master:1fbf0a9 d57a433 is described below commit d57a43300042142115d1bd2c1e26ebab95ea1359 Author: Mike Walch AuthorDate: Tue Jul 17 10:18:02 2018 -0400 Jekyll build from master:1fbf0a9 Improved linking in Replication docs (#99) --- docs/2.0/administration/replication.html | 146 ++++++++++--------------------- feed.xml | 4 +- search_data.json | 2 +- 3 files changed, 49 insertions(+), 103 deletions(-) diff --git a/docs/2.0/administration/replication.html b/docs/2.0/administration/replication.html index 1a1ebdf..946081c 100644 --- a/docs/2.0/administration/replication.html +++ b/docs/2.0/administration/replication.html @@ -387,7 +387,7 @@ into the following sections.

Each system involved in replication (even the primary) needs a name that uniquely identifies it across all peers in the replication graph. This should be considered -fixed for an instance, and set in accumulo-site.xml.

+fixed for an instance, and set using replication.name in accumulo-site.xml.

<property>
     <name>replication.name</name>
@@ -463,52 +463,47 @@ Monitor server, using the Replication lin
 
 

Work Assignment

-

Depending on the schema of a table, different implementations of the WorkAssigner used could -be configured. The implementation is controlled via the property replication.work.assigner -and the full class name for the implementation. This can be configured via the shell or -accumulo-site.xml.

+

Depending on the schema of a table, different implementations of the WorkAssigner +used could be configured. The implementation is controlled via the property replication.work.assigner +and the full class name for the implementation. This can be configured via the shell or accumulo-site.xml.

-
<property>
-    <name>replication.work.assigner</name>
-    <value>org.apache.accumulo.master.replication.SequentialWorkAssigner</value>
-    <description>Implementation used to assign work for replication</description>
-</property>
-
- -
root@accumulo_primary> config -t my_table -s replication.work.assigner=org.apache.accumulo.master.replication.SequentialWorkAssigner
-
+

Two implementations of WorkAssigner are provided:

-

Two implementations are provided. By default, the SequentialWorkAssigner is configured for an -instance. The SequentialWorkAssigner ensures that, per peer and each remote identifier, each WAL is -replicated in the order in which they were created. This is sufficient to ensure that updates to a table -will be replayed in the correct order on the peer. This implementation has the downside of only replicating -a single WAL at a time.

- -

The second implementation, the UnorderedWorkAssigner can be used to overcome the limitation +

    +
  1. +

    The UnorderedWorkAssigner can be used to overcome the limitation of only a single WAL being replicated to a target and peer at any time. Depending on the table schema, it’s possible that multiple versions of the same Key with different values are infrequent or nonexistent. In this case, parallel replication to a peer and target is possible without any downsides. In the case where this implementation is used were column updates are frequent, it is possible that there will be an inconsistency between the primary and the peer.

    +
  2. +
  3. +

    The SequentialWorkAssigner is configured for an +instance. The SequentialWorkAssigner ensures that, per peer and each remote identifier, each WAL is +replicated in the order in which they were created. This is sufficient to ensure that updates to a table +will be replayed in the correct order on the peer. This implementation has the downside of only replicating +a single WAL at a time.

    +
  4. +

ReplicaSystems

-

ReplicaSystem is the interface which allows abstraction of replication of data -to peers of various types. Presently, only an AccumuloReplicaSystem is provided -which will replicate data to another Accumulo instance. A ReplicaSystem implementation -is run inside of the TabletServer process, and can be configured as mentioned in the -Instance Configuration section of this document. Theoretically, an implementation -of this interface could send data to other filesystems, databases, etc.

+

ReplicaSystem is the interface which allows abstraction of replication of data +to peers of various types. Presently, only an AccumuloReplicaSystem is provided +which will replicate data to another Accumulo instance. A ReplicaSystem implementation +is run inside of the TabletServer process, and can be configured as mentioned in Instance Configuration +section of this document. Theoretically, an implementation of this interface could send data to other filesystems, databases, etc.

AccumuloReplicaSystem

-

The AccumuloReplicaSystem uses Thrift to communicate with a peer Accumulo instance +

The AccumuloReplicaSystem uses Thrift to communicate with a peer Accumulo instance and replicate the necessary data. The TabletServer running on the primary will communicate with the Master on the peer to request the address of a TabletServer on the peer which this TabletServer will use to replicate the data.

The TabletServer on the primary will then replicate data in batches of a configurable -size (replication.max.unit.size). The TabletServer on the peer will report how many +size (replication.max.unit.size). The TabletServer on the peer will report how many records were applied back to the primary, which will be used to record how many records were successfully replicated. The TabletServer on the primary will continue to replicate data in these batches until no more data can be read from the file.

@@ -518,77 +513,28 @@ data in these batches until no more data can be read from the file.

There are a number of configuration values that can be used to control how the implementation of various components operate.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PropertyDescriptionDefault
replication.max.work.queueMaximum number of files queued for replication at one time1000
replication.work.assignment.sleepTime between invocations of the WorkAssigner30s
replication.worker.threadsSize of threadpool used to replicate data to peers4
replication.receipt.service.portThrift service port to listen for replication requests, can use ‘0’ for a random port10002
replication.work.attemptsNumber of attempts to replicate to a peer before aborting the attempt10
replication.receiver.min.threadsMinimum number of idle threads for handling incoming replication1
replication.receiver.threadcheck.timeTime between attempting adjustments of thread pool for incoming replications30s
replication.max.unit.sizeMaximum amount of data to be replicated in one RPC64M
replication.work.assignerWork Assigner implementationorg.apache.accumulo.master.replication.SequentialWorkAssigner
tserver.replication.batchwriter.replayer.memorySize of BatchWriter cache to use in applying replication requests50M
+

Example Practical Configuration

A real-life example is now provided to give concrete application of replication configuration. This example is a two instance Accumulo system, one primary system and one peer system. They are called -primary and peer, respectively. Each system also have a table of the same name, “my_table”. The instance -name for each is also the same (primary and peer), and both have ZooKeeper hosts on a node with a hostname +primary and peer, respectively. Each system also have a table of the same name, my_table. The instance +name for each is also the same (primary and peer), and both have ZooKeeper hosts on a node with a hostname with that name as well (primary:2181 and peer:2181).

-

We want to configure these systems so that “my_table” on “primary” replicates to “my_table” on “peer”.

+

We want to configure these systems so that my_table on primary replicates to my_table on peer.

accumulo-site.xml

@@ -600,7 +546,6 @@ in replication together. In this example, we will use the names provided in the
<property>
   <name>replication.name</name>
   <value>primary</value>
-  <description>Defines the unique name</description>
 </property>
 
@@ -646,10 +591,10 @@ root@peer> tables -l

Define the Peer as a replication peer to the Primary

-

We’re defining the instance with replication.name of ‘peer’ as a peer. We provide the implementation of ReplicaSystem -that we want to use, and the configuration for the AccumuloReplicaSystem. In this case, the configuration is the Accumulo -Instance name for ‘peer’ and the ZooKeeper quorum string. The configuration key is of the form -“replication.peer.$peer_name”.

+

We’re defining the instance with replication.name of peer as a peer. We provide the implementation of ReplicaSystem +that we want to use, and the configuration for the AccumuloReplicaSystem. In this case, the configuration is the Accumulo +Instance name for peer and the ZooKeeper quorum string. The configuration key is of the form +replication.peer.$peer_name.

root@primary> config -s replication.peer.peer=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,peer,$peer_zk_quorum
 
@@ -666,8 +611,8 @@ root@primary> config -s replication.peer.password.peer=peer

Enable replication on the table

Now that we have defined the peer on the primary and provided the authentication credentials, we need to configure -our table with the implementation of ReplicaSystem we want to use to replicate to the peer. In this case, our peer -is an Accumulo instance, so we want to use the AccumuloReplicaSystem.

+our table with the implementation of ReplicaSystem we want to use to replicate to the peer. In this case, our peer +is an Accumulo instance, so we want to use the AccumuloReplicaSystem.

The configuration for the AccumuloReplicaSystem is the table ID for the table on the peer instance that we want to replicate into. Be sure to use the correct value for $peer_table_id. The configuration key is of @@ -806,6 +751,7 @@ are processed most quickly and pushed through the replication framework.

the WAL is fully replicated to all remote locations.

+
Find documentation for all releases in the archive
diff --git a/feed.xml b/feed.xml index 541d448..f2cfcae 100644 --- a/feed.xml +++ b/feed.xml @@ -6,8 +6,8 @@ https://accumulo.apache.org/ - Wed, 11 Jul 2018 17:09:36 -0400 - Wed, 11 Jul 2018 17:09:36 -0400 + Tue, 17 Jul 2018 10:17:51 -0400 + Tue, 17 Jul 2018 10:17:51 -0400 Jekyll v3.7.3 diff --git a/search_data.json b/search_data.json index 1e23cad..a3a6534 100644 --- a/search_data.json +++ b/search_data.json @@ -58,7 +58,7 @@ "docs-2-0-administration-replication": { "title": "Replication", - "content" : "OverviewReplication is a feature of Accumulo which provides a mechanism to automaticallycopy data to other systems, typically for the purpose of disaster recovery,high availability, or geographic locality. It is best to consider this featureas a framework for automatic replication instead of the ability to copy datafrom to another Accumulo instance as copying to another Accumulo cluster isonly an implementation detail. The local Accumulo cluster is hereby referredto a [...] + "content" : "OverviewReplication is a feature of Accumulo which provides a mechanism to automaticallycopy data to other systems, typically for the purpose of disaster recovery,high availability, or geographic locality. It is best to consider this featureas a framework for automatic replication instead of the ability to copy datafrom to another Accumulo instance as copying to another Accumulo cluster isonly an implementation detail. The local Accumulo cluster is hereby referredto a [...] "url": " /docs/2.0/administration/replication", "categories": "administration" },