accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1188946 - in /incubator/accumulo/trunk/docs/src/user_manual/chapters: clients.tex high_speed_ingest.tex table_configuration.tex
Date Tue, 25 Oct 2011 22:14:45 GMT
Author: kturner
Date: Tue Oct 25 22:14:45 2011
New Revision: 1188946

ACCUMULO-68 added documentation for isolation and logical time for bulk import to the user


Modified: incubator/accumulo/trunk/docs/src/user_manual/chapters/clients.tex
--- incubator/accumulo/trunk/docs/src/user_manual/chapters/clients.tex (original)
+++ incubator/accumulo/trunk/docs/src/user_manual/chapters/clients.tex Tue Oct 25 22:14:45
@@ -109,6 +109,32 @@ for(Entry<Key,Value> entry : scan) {
+\subsection{Isolated Scanner}
+Accumulo supports the ability to present an isolated view of rows when
+scanning.  There are three possible ways that a row could change in accumulo :
+ \item a mutation applied to a table
+ \item iterators executed as part of a minor or major compaction
+ \item bulk import of new files
+Isolation guarantees that either all or none of the changes made by these
+operations on a row are seen.  Use the IsolatedScanner to obtain an isolated
+view of a accumulo table.  When using the regular scanner it is possible to see
+a non isolated view of a row.  For example if a mutation modifies three
+columns, it is possible that you will only see two of those modifications.
+With the isolated scanner either all three of the changes are seen or none.
+The IsolatedScanner buffers rows on the client side so a large row will not
+crash a tablet server.  By default rows are buffered in memory, but the user
+can easily supply their own buffer if they wish to buffer to disk when rows are
+For an example, look at the following\\ 
 For some types of access, it is more efficient to retrieve several ranges

Modified: incubator/accumulo/trunk/docs/src/user_manual/chapters/high_speed_ingest.tex
--- incubator/accumulo/trunk/docs/src/user_manual/chapters/high_speed_ingest.tex (original)
+++ incubator/accumulo/trunk/docs/src/user_manual/chapters/high_speed_ingest.tex Tue Oct 25
22:14:45 2011
@@ -108,6 +108,26 @@ second directory specified.
 A complete example of using Bulk Ingest can be found at\\
+\section{Logical Time for Bulk Ingest}
+Logical time is important for bulk imported data, for which the client code may
+be choosing a timestamp. At bulk import time, the user can choose to enable
+logical time for the set of files being imported.  When its enabled, Accumulo
+uses a specialized system iterator to lazily set times in a bulk imported file.
+This mechanism guarantees that times set by unsynchronized multi-node
+applications (such as those running on MapReduce) will maintain some semblance
+of causal ordering. This mitigates the problem of the time being wrong on the
+system that created the file for bulk import. These times are not set when the
+file is imported, but whenever it is read by scans or compactions. At import, a
+time is obtained and always used by the specialized system iterator to set that
+The timestamp asigned by accumulo will be the same for every key in the file.
+This could cause problems if the file contains multiple keys that are identical
+except for the timestamp.  In this case, the sort order of the keys will be
+undefined. This could occur if an insert and an update were in the same bulk
+import file.
 \section{MapReduce Ingest}
 It is possible to efficiently write many mutations to Accumulo in parallel via a
 MapReduce job. In this scenario the MapReduce is written to process data that lives

Modified: incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex
--- incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex (original)
+++ incubator/accumulo/trunk/docs/src/user_manual/chapters/table_configuration.tex Tue Oct
25 22:14:45 2011
@@ -218,8 +218,10 @@ table.iterator.majc.vers.opt.maxVersions
 Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
 set by accumulo always move forward. This helps avoid problems caused by
 TabletServers that have different time settings. The per tablet counter gives unique
-one up time stamps on a per mutation basis. When using time in milliseconds, if two
-things arrive within the same millisecond then both receive the same timestamp.
+one up time stamps on a per mutation basis. When using time in milliseconds, if
+two things arrive within the same millisecond then both receive the same
+timestamp.  When using time in milliseconds, accumulo set times will still
+always move forward and never backwards.
 A table can be configured to use logical timestamps at creation time as follows:

View raw message