accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [6/6] git commit: ACCUMULO-2925 Add warning about server-assigned timestamps with replication
Date Fri, 20 Jun 2014 01:25:25 GMT
ACCUMULO-2925 Add warning about server-assigned timestamps with replication

Leave a note about updates to equal keys that have different updates that are
assigned the same timestamp by the server.


Branch: refs/heads/master
Commit: 4d7e90aeef3a6de6a36a30a188d5c1bc564ade3a
Parents: 0676057
Author: Josh Elser <>
Authored: Thu Jun 19 17:58:10 2014 -0700
Committer: Josh Elser <>
Committed: Thu Jun 19 17:58:10 2014 -0700

 docs/src/main/asciidoc/chapters/replication.txt | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
diff --git a/docs/src/main/asciidoc/chapters/replication.txt b/docs/src/main/asciidoc/chapters/replication.txt
index 8755e24..5d24649 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -361,3 +361,23 @@ primary and peer. As such, the SummingCombiner wouldn't be recommended
on a tabl
 While there are changes that could be made to the replication implementation which could
attempt to mitigate this risk,
 presently, it is not recommended to configure Iterators or Combiners which are not idempotent
to support cases where
 inaccuracy of aggregations is not acceptable.
+==== Server-Assigned Timestamps
+Accumulo has the ability to, when not provided by the client, assign a timestamp to updates
made to a table. This is a
+very useful feature as it reduces the amount of code a client must write and also gives some
notion of ordering to the
+updates that were made to a table (in addition to some solving some very problematic Accumulo
implementation details).
+However, replicating Mutations that were created with a server-assigned timestamp can be
very problematic. To understand
+this, we must first start at the BatchWriter.
+To allow for efficient ingest into Accumulo, the BatchWriter will collect many mutations,
group them into batches and
+send them to the correct server to be applied to the appropriate Tablet. For each Mutation
in that batch that the server
+receives, the server will set a timestamp that is at least as large as the last timestamp
(to account for clock skew). In short,
+this means that all of the Mutations in this batch will get the same timestamp and be deduplicated
in a certain order
+via the in-memory map and recorded in the write-ahead log.
+The problem is that these updates could be replayed on the remote in different commit sessions,
which means that they
+could result in different RFiles on disk (separate minor-compactions). Because of this, mutations
with server-assigned
+timestamps which are written within the same batch have the possibility to be applied in
a different order on a peer. In
+the case where a user might submit multiple updates for the same Key in rapid succession,
the user should ensure proper
+timestamps are set at the client.

View raw message