accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject accumulo git commit: ACCUMULO-3500 Update replication docs for bulk imports
Date Thu, 22 Jan 2015 15:45:20 GMT
Repository: accumulo
Updated Branches:
  refs/heads/master 4b1196257 -> 80805545e

ACCUMULO-3500 Update replication docs for bulk imports


Branch: refs/heads/master
Commit: 80805545e7617bed41bfd5f50c0ba8032fd71d91
Parents: 4b11962
Author: Josh Elser <>
Authored: Thu Jan 22 10:39:41 2015 -0500
Committer: Josh Elser <>
Committed: Thu Jan 22 10:39:41 2015 -0500

 docs/src/main/asciidoc/chapters/replication.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/docs/src/main/asciidoc/chapters/replication.txt b/docs/src/main/asciidoc/chapters/replication.txt
index 48f6ffa..69bb3c4 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -377,3 +377,13 @@ As is the recommendation without replication enabled, if multiple values
for the
 Accumulo, it is strongly recommended that the value in the timestamp properly reflects the
intended version by
 the client. That is to say, newer values inserted into the table should have larger timestamps.
If the time between
 writing updates to the same key is significant (order minutes), this concern can likely be
+==== Bulk Imports
+Currently, files that are bulk imported into a table configured for replication are not replicated.
There is no
+technical reason why it was not implemented, it was simply omitted from the initial implementation.
This is considered a
+fair limitation because bulk importing generated files multiple locations is much simpler
than bifurcating "live" ingest
+data into two instances. Given some existing bulk import process which creates files and
them imports them into an
+Accumulo instance, it is trivial to copy those files to a new HDFS instance and import them
into another Accumulo
+instance using the same process. Hadoop's +distcp+ command provides an easy way to copy large
amounts of data to another
+HDFS instance which makes the problem of duplicating bulk imports very easy to solve.

View raw message