Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A574017A25 for ; Thu, 22 Jan 2015 02:04:13 +0000 (UTC) Received: (qmail 14700 invoked by uid 500); 22 Jan 2015 02:04:13 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 14667 invoked by uid 500); 22 Jan 2015 02:04:13 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 14658 invoked by uid 99); 22 Jan 2015 02:04:13 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2015 02:04:13 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 71C19E03AB; Thu, 22 Jan 2015 02:04:13 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: elserj@apache.org To: commits@accumulo.apache.org Message-Id: <72112f70b4194cc195b283d570b84413@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: accumulo git commit: ACCUMULO-3502 Update documentation about "server timestamps" Date: Thu, 22 Jan 2015 02:04:13 +0000 (UTC) Repository: accumulo Updated Branches: refs/heads/master da3534115 -> 4b1196257 ACCUMULO-3502 Update documentation about "server timestamps" This started as a realization about server-assigned timestamps, but was really meant to warn that the non-determinism of multiple updates to the same exact key is independent of replicas and the primary. Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/4b119625 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/4b119625 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/4b119625 Branch: refs/heads/master Commit: 4b1196257070a1ab788372f03725dc0425567a63 Parents: da35341 Author: Josh Elser Authored: Wed Jan 21 21:00:16 2015 -0500 Committer: Josh Elser Committed: Wed Jan 21 21:00:16 2015 -0500 ---------------------------------------------------------------------- docs/src/main/asciidoc/chapters/replication.txt | 34 +++++++++----------- 1 file changed, 15 insertions(+), 19 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/4b119625/docs/src/main/asciidoc/chapters/replication.txt ---------------------------------------------------------------------- diff --git a/docs/src/main/asciidoc/chapters/replication.txt b/docs/src/main/asciidoc/chapters/replication.txt index 5d24649..48f6ffa 100644 --- a/docs/src/main/asciidoc/chapters/replication.txt +++ b/docs/src/main/asciidoc/chapters/replication.txt @@ -362,22 +362,18 @@ While there are changes that could be made to the replication implementation whi presently, it is not recommended to configure Iterators or Combiners which are not idempotent to support cases where inaccuracy of aggregations is not acceptable. -==== Server-Assigned Timestamps - -Accumulo has the ability to, when not provided by the client, assign a timestamp to updates made to a table. This is a -very useful feature as it reduces the amount of code a client must write and also gives some notion of ordering to the -updates that were made to a table (in addition to some solving some very problematic Accumulo implementation details). -However, replicating Mutations that were created with a server-assigned timestamp can be very problematic. To understand -this, we must first start at the BatchWriter. - -To allow for efficient ingest into Accumulo, the BatchWriter will collect many mutations, group them into batches and -send them to the correct server to be applied to the appropriate Tablet. For each Mutation in that batch that the server -receives, the server will set a timestamp that is at least as large as the last timestamp (to account for clock skew). In short, -this means that all of the Mutations in this batch will get the same timestamp and be deduplicated in a certain order -via the in-memory map and recorded in the write-ahead log. - -The problem is that these updates could be replayed on the remote in different commit sessions, which means that they -could result in different RFiles on disk (separate minor-compactions). Because of this, mutations with server-assigned -timestamps which are written within the same batch have the possibility to be applied in a different order on a peer. In -the case where a user might submit multiple updates for the same Key in rapid succession, the user should ensure proper -timestamps are set at the client. +==== Duplicate Keys + +In Accumulo, when more than one key exists that are exactly the same, keys that are equal down to the timestamp, +the retained value is non-deterministic. Replication introduces another level of non-determinism in this case. +For a table that is being replicated and has multiple equal keys with different values inserted into it, the final +value in that table on the primary instance is not guaranteed to be the final value on all replicas. + +For example, say the values that were inserted on the primary instance were +value1+ and +value2+ and the final +value was +value1+, it is not guaranteed that all replicas will have +value1+ like the primary. The final value is +non-deterministic for each instance. + +As is the recommendation without replication enabled, if multiple values for the same key (sans timestamp) are written to +Accumulo, it is strongly recommended that the value in the timestamp properly reflects the intended version by +the client. That is to say, newer values inserted into the table should have larger timestamps. If the time between +writing updates to the same key is significant (order minutes), this concern can likely be ignored.