# accumulo-commits mailing list archives

##### Site index · List index
Message view
Top
From vi...@apache.org
Subject svn commit: r1482357 - in /accumulo/branches/1.5/docs/src/main/latex: accumulo_developer_manual/ accumulo_user_manual/ accumulo_user_manual/chapters/
Date Tue, 14 May 2013 14:47:34 GMT
Author: vines
Date: Tue May 14 14:47:33 2013
New Revision: 1482357

URL: http://svn.apache.org/r1482357
Log:
ACCUMULO-992 - unmerging the unmerge, sans broken image

Modified:
accumulo/branches/1.5/docs/src/main/latex/accumulo_developer_manual/accumulo_developer_manual.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/accumulo_user_manual.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/analytics.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/clients.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/design.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/security.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex
accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_developer_manual/accumulo_developer_manual.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_developer_manual/accumulo_developer_manual.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_developer_manual/accumulo_developer_manual.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_developer_manual/accumulo_developer_manual.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -75,7 +75,7 @@ Each of the aforementioned internal comp
Figure \ref{fig_ts_rw} shows the Tablet Server data flow during regular read/write operations.
All of the descriptions in this section will refer to the data flows shown in figure \ref{fig_ts_rw}.
-In (1) and (8), the Client contacts RPCs within the Thrift service hosted on the TabletServer.
+In (1) and (8), the Client contacts RPCs within the Thrift service hosted on the TabletServer.
These RPCs are all the org.apache.accumulo.server.tabletserver.TabletServer.ThriftClientHandler, that implements the org.apache.accumulo.core.tabletserver.thrift.ThriftClientHandler.Iface interface.
Methods within this interface are divided into read and write methods.

@@ -101,7 +101,7 @@ closeUpdate(long) returns the list of ta

When mutations arrive at the tablet server (1), they are cached briefly to queue up many streamed mutations for the same session.
If cache memory fills, or a session is closed, the Mutations are flushed (2) through to a local Logger component which
-forwards the updates to a remote Logger service (4) using a RPC interface.
+forwards the updates to a remote Logger service (4) using a RPC interface.
In order to provide redundancy, the mutations are sent to two different Logger servers in the cluster.
The Tablet Server records what logs are used (3) by each tablet in the Metadata Table describing the tablet.
When a Tablet Server fails, its Tablets are reassigned by the Master,

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/accumulo_user_manual.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/accumulo_user_manual.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/accumulo_user_manual.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/accumulo_user_manual.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%

==============================================================================
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -153,11 +153,11 @@ acting as TabletServers.
\end{verbatim}
\normalsize

-The instance needs a secret to enable secure communication between servers.  Configure your
+The instance needs a secret to enable secure communication between servers. Configure your
secret and make sure that the \texttt{accumulo-site.xml} file is not readable to other users.

Some settings can be modified via the Accumulo shell and take effect immediately, but
-some settings require a process restart to take effect.  See the configuration documentation
+some settings require a process restart to take effect. See the configuration documentation
(available on the monitor web pages) for details.

\subsection{Deploy Configuration}

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/analytics.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/analytics.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/analytics.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/analytics.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -105,7 +105,7 @@ AccumuloInputFormat.setRanges(job, range
\end{verbatim}
\normalsize

-To restrict accumulo to a list of columns:
+To restrict Accumulo to a list of columns:

\small
\begin{verbatim}

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/clients.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/clients.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/clients.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/clients.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -18,7 +18,7 @@

\section{Running Client Code}

-There are multiple ways to run Java code that uses Accumulo.  Below is a list
+There are multiple ways to run Java code that uses Accumulo. Below is a list
of the different ways to execute client code.

\begin{itemize}
@@ -28,11 +28,11 @@ of the different ways to execute client
\end{itemize}

In order to run client code written to run against Accumulo, you will need to
-include the jars that Accumulo depends on in your classpath.  Accumulo client
+include the jars that Accumulo depends on in your classpath. Accumulo client
of the jars in the Hadoop lib directory, and the conf directory to the
classpath. For Zookeeper 3.3 you only need to add the Zookeeper jar, and not
-what is in the Zookeeper lib directory.  You can run the following command on a
+what is in the Zookeeper lib directory. You can run the following command on a
configured Accumulo system to see what its using for its classpath.

\small
@@ -42,13 +42,13 @@ $ACCUMULO_HOME/bin/accumulo classpath \normalsize Another option for running your code is to put a jar file in -\texttt{\$ACCUMULO\_HOME/lib/ext}.  After doing this you can use the accumulo
-script to execute your code.  For example if you create a jar containing the
+\texttt{\$ACCUMULO\_HOME/lib/ext}. After doing this you can use the accumulo +script to execute your code. For example if you create a jar containing the class com.foo.Client and placed that in lib/ext, then you could use the command \texttt{\$ACCUMULO\_HOME/bin/accumulo com.foo.Client} to execute your code.

If you are writing map reduce job that access Accumulo, then you can use the
-bin/tool.sh script to run those jobs.  See the map reduce example.
+bin/tool.sh script to run those jobs. See the map reduce example.

\section{Connecting}

@@ -61,7 +61,7 @@ String instanceName = "myinstance";
String zooServers = "zooserver-one,zooserver-two"
Instance inst = new ZooKeeperInstance(instanceName, zooServers);

-Connector conn = inst.getConnector("user", "passwd");
+Connector conn = inst.getConnector("user", new PasswordToken("passwd"));
\end{verbatim}
\normalsize

@@ -148,7 +148,7 @@ for(Entry<Key,Value> entry : scan) {
\subsection{Isolated Scanner}

Accumulo supports the ability to present an isolated view of rows when
-scanning.  There are three possible ways that a row could change in accumulo :
+scanning. There are three possible ways that a row could change in Accumulo :

\begin{itemize}
\item a mutation applied to a table
@@ -157,14 +157,14 @@ scanning.  There are three possible ways
\end{itemize}

Isolation guarantees that either all or none of the changes made by these
-operations on a row are seen.  Use the IsolatedScanner to obtain an isolated
-view of an accumulo table.  When using the regular scanner it is possible to see
-a non isolated view of a row.  For example if a mutation modifies three
+operations on a row are seen. Use the IsolatedScanner to obtain an isolated
+view of an Accumulo table. When using the regular scanner it is possible to see
+a non isolated view of a row. For example if a mutation modifies three
columns, it is possible that you will only see two of those modifications.
With the isolated scanner either all three of the changes are seen or none.

The IsolatedScanner buffers rows on the client side so a large row will not
-crash a tablet server.  By default rows are buffered in memory, but the user
+crash a tablet server. By default rows are buffered in memory, but the user
can easily supply their own buffer if they wish to buffer to disk when rows are
large.

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/design.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/design.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/design.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/design.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -64,7 +64,7 @@ found in the write-ahead log to the tabl

\subsection{Garbage Collector}

-Accumulo processes will share files stored in HDFS.  Periodically, the Garbage
+Accumulo processes will share files stored in HDFS. Periodically, the Garbage
Collector will identify files that are no longer needed by any process, and
delete them.

@@ -77,7 +77,7 @@ tablets are assigned to one TabletServer
and deletion requests from clients. The Master also coordinates startup, graceful
shutdown and recovery of changes in write-ahead logs when Tablet servers fail.

-Multiple masters may be run.  The masters will choose among themselves a single master,
+Multiple masters may be run. The masters will choose among themselves a single master,
and the others will become backups if the master should fail.

\subsection{Client}
@@ -108,7 +108,7 @@ When a write arrives at a TabletServer i
then inserted into a sorted data structure in memory called a MemTable. When the
MemTable reaches a certain size the TabletServer writes out the sorted key-value
pairs to a file in HDFS called Indexed Sequential Access Method (ISAM)
-file. This process is called a minor compaction.  A new MemTable is then created
+file. This process is called a minor compaction. A new MemTable is then created
and the fact of the compaction is recorded in the Write-Ahead Log.

When a request to read data arrives at a TabletServer, the TabletServer does a
@@ -128,20 +128,20 @@ delete entry when the new file is create

\section{Splitting}

-When a table is created it has one tablet.  As the table grows its initial
-tablet eventually splits into two tablets.   Its likely that one of these
-tablets will migrate to another tablet server.  As the table continues to grow,
-its tablets will continue to split and be migrated.  The decision to
-automatically split a tablet is based on the size of a tablets files.   The
-size threshold at which a tablet splits is configurable per table.  In addition
+When a table is created it has one tablet. As the table grows its initial
+tablet eventually splits into two tablets. Its likely that one of these
+tablets will migrate to another tablet server. As the table continues to grow,
+its tablets will continue to split and be migrated. The decision to
+automatically split a tablet is based on the size of a tablets files. The
+size threshold at which a tablet splits is configurable per table. In addition
to automatic splitting, a user can manually add split points to a table to
-create new tablets.  Manually splitting a new table can parallelize reads and
+create new tablets. Manually splitting a new table can parallelize reads and
writes giving better initial performance without waiting for automatic
splitting.

-As data is deleted from a table, tablets may shrink.  Over time this can lead
-to small or empty tablets.   To deal with this, merging of tablets was
-introduced in Accumulo 1.4.  This is discussed in more detail later.
+As data is deleted from a table, tablets may shrink. Over time this can lead
+to small or empty tablets. To deal with this, merging of tablets was
+introduced in Accumulo 1.4. This is discussed in more detail later.

\section{Fault-Tolerance}

@@ -152,7 +152,7 @@ Log to prevent any loss of data.

The Master will coordinate the copying of write-ahead logs to HDFS so the logs
are available to all tablet servers. To make recovery efficient, the updates
-within a log are grouped by tablet.  TabletServers can quickly apply the
+within a log are grouped by tablet. TabletServers can quickly apply the
mutations from the sorted logs that are destined for the tablets they have now
been assigned.

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -16,7 +16,7 @@

\chapter{Development Clients}

-Normally, Accumulo consists of lots of moving parts.  Even a stand-alone version of
+Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of
Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc. If
you want to write a unit test that uses Accumulo, you need a lot of infrastructure
in place before your test can run.
@@ -33,7 +33,7 @@ While normal interaction with the Accumu
\small
\begin{verbatim}
Instance instance = new ZooKeeperInstance(...);
-Connector conn = instance.getConnector(user, passwd);
\end{verbatim}
\normalsize

@@ -109,7 +109,7 @@ Once we have our mini cluster running, w
\small
\begin{verbatim}
Instance instance = new ZooKeeperInstance(accumulo.getInstanceName(), accumulo.getZooKeepers());
\end{verbatim}
\normalsize

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%

Logical time is important for bulk imported data, for which the client code may
be choosing a timestamp. At bulk import time, the user can choose to enable
-logical time for the set of files being imported.  When its enabled, Accumulo
+logical time for the set of files being imported. When its enabled, Accumulo
uses a specialized system iterator to lazily set times in a bulk imported file.
This mechanism guarantees that times set by unsynchronized multi-node
applications (such as those running on MapReduce) will maintain some semblance
@@ -122,9 +122,9 @@ file is imported, but whenever it is rea
time is obtained and always used by the specialized system iterator to set that
time.

-The timestamp assigned by accumulo will be the same for every key in the file.
+The timestamp assigned by Accumulo will be the same for every key in the file.
This could cause problems if the file contains multiple keys that are identical
-except for the timestamp.  In this case, the sort order of the keys will be
+except for the timestamp. In this case, the sort order of the keys will be
undefined. This could occur if an insert and an update were in the same bulk
import file.

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/security.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/security.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/security.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/security.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -104,18 +104,18 @@ Scanner s = connector.createScanner("tab

\section{User Authorizations}

-Each accumulo user has a set of associated security labels.  To manipulate
-these in the shell use the setuaths and getauths commands.  These may also be
-modified using the java security operations API.
+Each Accumulo user has a set of associated security labels. To manipulate
+these in the shell while using the default authorizor, use the setuaths and getauths commands.
+These may also be modified for the default authorizor using the java security operations API.

-When a user creates a scanner a set of Authorizations is passed.  If the
+When a user creates a scanner a set of Authorizations is passed. If the
authorizations passed to the scanner are not a subset of the users
authorizations, then an exception will be thrown.

To prevent users from writing data they can not read, add the visibility
-constraint to a table.  Use the -evc option in the createtable shell command to
-enable this constraint.  For existing tables use the following shell command to
-enable the visibility constraint.  Ensure the constraint number does not
+constraint to a table. Use the -evc option in the createtable shell command to
+enable this constraint. For existing tables use the following shell command to
+enable the visibility constraint. Ensure the constraint number does not
conflict with any existing constraints.

\small
@@ -128,13 +128,40 @@ Any user with the alter table permission
This constraint is not applied to bulk imported data, if this a concern then
disable the bulk import permission.

+\section{Pluggable Security}
+
+New in 1.5 of Accumulo is a pluggable security mechanism. It can be broken into three actions-
+authentication, authorization, and permission handling. By default all of these are handled in
+Zookeeper, which is how things were handled in Accumulo 1.4 and before. It is worth noting at this
+point, that it is a new feature in 1.5 and may be adjusted in future releases without the standard
+deprecation cycle.
+
+Authentication simply handles the ability for a user to verify their integrity. A combination of
+principal and authentication token are used to verify a user is who they say they are. An
+authentication token should be constructed, either directly through it's constructor, but it is
+advised to use the init(Property) method to populate an authentication token. It is expected that a
+user knows what the appropriate token to use for their system is. The default token is
+
+Once a user is authenticated by the Authenticator, the user has access to the other actions within
+Accumulo. All actions in Accumulo are ACLed, and this ACL check is handled by the Permission
+Handler. This is what manages all of the permissions, which are divided in system and per table
+level. From there, if a user is doing an action which requires authorizations, the Authorizor is
+queried to determine what authorizations the user has.
+
+This setup allows a variety of different mechanisms to be used for handling different aspects of
+Accumulo's security. A system like Kerberos can be used for authentication, then a system like LDAP
+could be used to determine if a user has a specific permission, and then it may default back to the
+default ZookeeperAuthorizor to determine what Authorizations a user is ultimately allowed to use.
+This is a pluggable system so custom components can be created depending on your need.
+
\section{Secure Authorizations Handling}

-For applications serving many users, it is not expected that an accumulo user
-will be created for each application user.  In this case an accumulo user with
-all authorizations needed by any of the applications users must be created.  To
+For applications serving many users, it is not expected that an Accumulo user
+will be created for each application user. In this case an Accumulo user with
+all authorizations needed by any of the applications users must be created. To
service queries, the application should create a scanner with the application
-user's authorizations.  These authorizations could be obtained from a trusted 3rd
+user's authorizations. These authorizations could be obtained from a trusted 3rd
party.

Often production systems will integrate with Public-Key Infrastructure (PKI) and

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -17,7 +17,7 @@
\chapter{Accumulo Shell}
Accumulo provides a simple shell that can be used to examine the contents and
configuration settings of tables, insert/update/delete values, and change
-configuration settings.
+configuration settings.

The shell can be started by the following command:

@@ -87,7 +87,7 @@ row1 colf:colq [] value1
\end{verbatim}
\normalsize

-The value in brackets "[]" would be the visibility labels.  Since none were used, this is empty for this row.
+The value in brackets "[]" would be the visibility labels. Since none were used, this is empty for this row.
You can use the "-st" option to scan to see the timestamp for the cell, too.

\section{Table Maintenance}

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with
% The ASF licenses this file to You under the Apache License, Version 2.0
% (the "License"); you may not use this file except in compliance with
-% the License.  You may obtain a copy of the License at
+% the License. You may obtain a copy of the License at
%
%
@@ -202,7 +202,7 @@ Accumulo provides the capability to mana
timestamps within the Key. If a timestamp is not specified in the key created by the
client then the system will set the timestamp to the current time. Two keys with
identical rowIDs and columns but different timestamps are considered two versions
-of the same key. If two inserts are made into accumulo with the same rowID,
+of the same key. If two inserts are made into Accumulo with the same rowID,
column, and timestamp, then the behavior is non-deterministic.

Timestamps are sorted in descending order, so the most recent data comes first.
@@ -223,8 +223,8 @@ user@myinstance mytable> config -t mytab
\normalsize

When a table is created, by default its configured to use the
-VersioningIterator and keep one version.  A table can be created without the
-VersioningIterator with the -ndi option in the shell.  Also the Java API
+VersioningIterator and keep one version. A table can be created without the
+VersioningIterator with the -ndi option in the shell. Also the Java API
has the following method

\small
@@ -237,11 +237,11 @@ connector.tableOperations.create(String
\subsubsection{Logical Time}

Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
-set by accumulo always move forward. This helps avoid problems caused by
+set by Accumulo always move forward. This helps avoid problems caused by
TabletServers that have different time settings. The per tablet counter gives unique
one up time stamps on a per mutation basis. When using time in milliseconds, if
two things arrive within the same millisecond then both receive the same
-timestamp.  When using time in milliseconds, accumulo set times will still
+timestamp. When using time in milliseconds, Accumulo set times will still
always move forward and never backwards.

A table can be configured to use logical timestamps at creation time as follows:
@@ -253,8 +253,8 @@ user@myinstance> createtable -tl logical
\normalsize

\subsubsection{Deletes}
-Deletes are special keys in accumulo that get sorted along will all the other data.
-When a delete key is inserted, accumulo will not show anything that has a
+Deletes are special keys in Accumulo that get sorted along will all the other data.
+When a delete key is inserted, Accumulo will not show anything that has a
timestamp less than or equal to the delete key. During major compaction, any keys
older than a delete key are omitted from the new file created, and the omitted keys
are removed from disk as part of the regular garbage collection process.
@@ -341,7 +341,7 @@ rowID1  colfA  colqA     -          2
\end{verbatim}
\normalsize

-Combiners can be enabled for a table using the setiter command in the shell.  Below is an example.
+Combiners can be enabled for a table using the setiter command in the shell. Below is an example.

\small
\begin{verbatim}
@@ -366,7 +366,7 @@ foo day:20080103 []    1
\end{verbatim}
\normalsize

-Accumulo includes some useful Combiners out of the box.  To find these look in
+Accumulo includes some useful Combiners out of the box. To find these look in
the\\ \texttt{org.apache.accumulo.core.iterators.user} package.

Additional Combiners can be added by creating a Java class that extends\\
@@ -401,22 +401,22 @@ It is enabled by default for the !METADA

\section{Compaction}

-As data is written to Accumulo it is buffered in memory.  The data buffered in
-memory is eventually written to HDFS on a per tablet basis.  Files can also be
-added to tablets directly by bulk import.  In the background tablet servers run
-major compactions to merge multiple files into one.  The tablet server has to
+As data is written to Accumulo it is buffered in memory. The data buffered in
+memory is eventually written to HDFS on a per tablet basis. Files can also be
+added to tablets directly by bulk import. In the background tablet servers run
+major compactions to merge multiple files into one. The tablet server has to
decide which tablets to compact and which files within a tablet to compact.
This decision is made using the compaction ratio, which is configurable on a
-per table basis.  To configure this ratio modify the following property:
+per table basis. To configure this ratio modify the following property:

\begin{verbatim}
table.compaction.major.ratio
\end{verbatim}

Increasing this ratio will result in more files per tablet and less compaction
-work.  More files per tablet means more higher query latency.  So adjusting
-this ratio is a trade off between ingest and query performance.  The ratio
-defaults to 3.
+work. More files per tablet means more higher query latency. So adjusting
+this ratio is a trade off between ingest and query performance. The ratio
+defaults to 3.

The way the ratio works is that a set of files is compacted into one file if the
sum of the sizes of the files in the set is larger than the ratio multiplied by
@@ -426,35 +426,35 @@ remaining files are considered for compa
compaction is triggered or there are no files left to consider.

The number of background threads tablet servers use to run major compactions is
-configurable.  To configure this modify the following property:
+configurable. To configure this modify the following property:

\begin{verbatim}
tserver.compaction.major.concurrent.max
\end{verbatim}

Also, the number of threads tablet servers use for minor compactions is
-configurable.  To configure this modify the following property:
+configurable. To configure this modify the following property:

\begin{verbatim}
tserver.compaction.minor.concurrent.max
\end{verbatim}

The numbers of minor and major compactions running and queued is visible on the
-Accumulo monitor page.  This allows you to see if compactions are backing up
-and adjustments to the above settings are needed.  When adjusting the number of
+Accumulo monitor page. This allows you to see if compactions are backing up
+and adjustments to the above settings are needed. When adjusting the number of
threads available for compactions, consider the number of cores and other tasks
running on the nodes such as maps and reduces.

If major compactions are not keeping up, then the number of files per tablet
will grow to a point such that query performance starts to suffer. One way to
-handle this situation is to increase the compaction ratio.  For example, if the
+handle this situation is to increase the compaction ratio. For example, if the
compaction ratio were set to 1, then every new file added to a tablet by minor
compaction would immediately queue the tablet for major compaction. So if a
tablet has a 200M file and minor compaction writes a 1M file, then the major
-compaction will attempt to merge the 200M and 1M file.  If the tablet server
+compaction will attempt to merge the 200M and 1M file. If the tablet server
has lots of tablets trying to do this sort of thing, then major compactions
will back up and the number of files per tablet will start to grow, assuming
-data is being continuously written.  Increasing the compaction ratio will
+data is being continuously written. Increasing the compaction ratio will
alleviate backups by lowering the amount of major compaction work that needs to
be done.

@@ -466,12 +466,12 @@ table.file.max
\end{verbatim}

When a tablet reaches this number of files and needs to flush its in-memory
-data to disk, it will choose to do a merging minor compaction.  A merging minor
+data to disk, it will choose to do a merging minor compaction. A merging minor
compaction will merge the tablet's smallest file with the data in memory at
-minor compaction time.  Therefore the number of files will not grow beyond this
-limit.  This will make minor compactions take longer, which will cause ingest
-performance to decrease.  This can cause ingest to slow down until major
-compactions have enough time to catch up.   When adjusting this property, also
+minor compaction time. Therefore the number of files will not grow beyond this
+limit. This will make minor compactions take longer, which will cause ingest
+performance to decrease. This can cause ingest to slow down until major
+compactions have enough time to catch up. When adjusting this property, also
consider adjusting the compaction ratio. Ideally, merging minor compactions
never need to occur and major compactions will keep up. It is possible to
configure the file max and compaction ratio such that only merging minor
@@ -480,20 +480,20 @@ because doing only merging minor compact
The amount of work done by major compactions is $O(N*\log_R(N))$ where
\textit{R} is the compaction ratio.

-Compactions can be initiated manually for a table.  To initiate a minor
-compaction, use the flush command in the shell.  To initiate a major compaction,
-use the compact command in the shell.  The compact command will compact all
-tablets in a table to one file.  Even tablets with one file are compacted.  This
+Compactions can be initiated manually for a table. To initiate a minor
+compaction, use the flush command in the shell. To initiate a major compaction,
+use the compact command in the shell. The compact command will compact all
+tablets in a table to one file. Even tablets with one file are compacted. This
is useful for the case where a major compaction filter is configured for a
-table. In 1.4 the ability to compact a range of a table was added.  To use this
-feature specify start and stop rows for the compact command.  This will only
+table. In 1.4 the ability to compact a range of a table was added. To use this
+feature specify start and stop rows for the compact command. This will only
compact tablets that overlap the given row range.

\section{Pre-splitting tables}

Accumulo will balance and distribute tables across servers. Before a
table gets large, it will be maintained as a single tablet on a single
-server.  This limits the speed at which data can be added or queried
+server. This limits the speed at which data can be added or queried
to the speed of a single node. To improve performance when the a table
is new, or small, you can add split points and generate new tablets.

@@ -506,26 +506,26 @@ root@myinstance> addsplits -t newTable g
\end{verbatim}
\normalsize

-This will create a new table with 4 tablets.  The table will be split
+This will create a new table with 4 tablets. The table will be split
on the letters g'', n'', and t'' which will work nicely if the
data includes binary information or numeric information, or if the
distribution of the row information is not flat, then you would pick
-different split points.  Now ingest and query can proceed on 4 nodes
+different split points. Now ingest and query can proceed on 4 nodes
which can improve performance.

\section{Merging tablets}

Over time, a table can get very large, so large that it has hundreds
-of thousands of split points.  Once there are enough tablets to spread
+of thousands of split points. Once there are enough tablets to spread
a table across the entire cluster, additional splits may not improve
-performance, and may create unnecessary bookkeeping.  The distribution
-of data may change over time.  For example, if row data contains date
+performance, and may create unnecessary bookkeeping. The distribution
+of data may change over time. For example, if row data contains date
information, and data is continually added and removed to maintain a
window of current information, tablets for older rows may be empty.

Accumulo supports tablet merging, which can be used to reduce
-the number of split points.  The following command will merge all rows
+the number of split points. The following command will merge all rows
from A'' to Z'' into a single tablet:

\small
@@ -545,7 +545,7 @@ root@myinstance> config -t myTable -s ta
\end{verbatim}
\normalsize

-In order to merge small tablets, you can ask accumulo to merge
+In order to merge small tablets, you can ask Accumulo to merge
sections of a table smaller than a given size.

\small
@@ -555,8 +555,8 @@ root@myinstance> merge -t myTable -s 100
\normalsize

By default, small tablets will not be merged into tablets that are
-already larger than the given size.  This can leave isolated small
-tablets.  To force small tablets to be merged into larger tablets use
+already larger than the given size. This can leave isolated small
+tablets. To force small tablets to be merged into larger tablets use
the --{}--force'' option:

\small
@@ -565,7 +565,7 @@ root@myinstance> merge -t myTable -s 100
\end{verbatim}
\normalsize

-Merging away small tablets works on one section at a time.  If your
+Merging away small tablets works on one section at a time. If your
table contains many sections of small split points, or you are
attempting to change the split size of the entire table, it will be
faster to set the split point and merge the entire table:
@@ -581,10 +581,10 @@ root@myinstance> merge -t myTable

Consider an indexing scheme that uses date information in each row.
For example 20110823-15:20:25.013'' might be a row that specifies a
-date and time.  In some cases, we might like to delete rows based on
+date and time. In some cases, we might like to delete rows based on
this date, say to remove all the data older than the current year.
Accumulo supports a delete range operation which efficiently
-removes data between two rows.  For example:
+removes data between two rows. For example:

\small
\begin{verbatim}
@@ -593,7 +593,7 @@ root@myinstance> deleterange -t myTable
\normalsize

This will delete all rows starting with 2010'' and it will stop at
-any row starting 2011''.  You can delete any data prior to 2011
+any row starting 2011''. You can delete any data prior to 2011
with:

\small
@@ -610,23 +610,23 @@ positions, and will affect the number of

\section{Cloning Tables}

-A new table can be created that points to an existing table's data.  This is a
-very quick metadata operation, no data is actually copied.  The cloned table
-and the source table can change independently after the clone operation.  One
-use case for this feature is testing.  For example to test a new filtering
+A new table can be created that points to an existing table's data. This is a
+very quick metadata operation, no data is actually copied. The cloned table
+and the source table can change independently after the clone operation. One
+use case for this feature is testing. For example to test a new filtering
iterator, clone the table, add the filter to the clone, and force a major
-compaction.  To perform a test on less data, clone a table and then use delete
-range to efficiently remove a lot of data from the clone.  Another use case is
-generating a snapshot to guard against human error.  To create a snapshot,
+compaction. To perform a test on less data, clone a table and then use delete
+range to efficiently remove a lot of data from the clone. Another use case is
+generating a snapshot to guard against human error. To create a snapshot,
clone a table and then disable write permissions on the clone.

-The clone operation will point to the source table's files.  This is why the
-flush option is present and is enabled by default in the shell.  If the flush
+The clone operation will point to the source table's files. This is why the
+flush option is present and is enabled by default in the shell. If the flush
option is not enabled, then any data the source table currently has in memory
will not exist in the clone.

-A cloned table copies the configuration of the source table.  However the
-permissions of the source table are not copied to the clone.  After a clone is
+A cloned table copies the configuration of the source table. However the
+permissions of the source table are not copied to the clone. After a clone is
created, only the user that created the clone can read and write to it.

In the following example we see that data inserted after the clone operation is
@@ -655,10 +655,10 @@ root@a14 test>

The du command in the shell shows how much space a table is using in HDFS.
This command can also show how much overlapping space two cloned tables have in
-HDFS.  In the example below du shows table ci is using 428M.  Then ci is cloned
-to cic and du shows that both tables share 428M.  After three entries are
+HDFS. In the example below du shows table ci is using 428M. Then ci is cloned
+to cic and du shows that both tables share 428M. After three entries are
inserted into cic and its flushed, du shows the two tables still share 428M but
-cic has 226 bytes to itself.  Finally, table cic is compacted and then du shows
+cic has 226 bytes to itself. Finally, table cic is compacted and then du shows
that each table uses 428M.

\small
@@ -690,9 +690,9 @@ root@a14 cic>
\section{Exporting Tables}

Accumulo supports exporting tables for the purpose of copying tables to another
-cluster.  Exporting and importing tables preserves the tables configuration,
-splits, and logical time.  Tables are exported and then copied via the hadoop
-distcp command.  To export a table, it must be offline and stay offline while
-discp runs.  The reason it needs to stay offline is to prevent files from being
-deleted.  A table can be cloned and the clone taken offline inorder to avoid
+cluster. Exporting and importing tables preserves the tables configuration,
+splits, and logical time. Tables are exported and then copied via the hadoop
+distcp command. To export a table, it must be offline and stay offline while
+discp runs. The reason it needs to stay offline is to prevent files from being
+deleted. A table can be cloned and the clone taken offline inorder to avoid

Modified: accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex
URL: http://svn.apache.org/viewvc/accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex?rev=1482357&r1=1482356&r2=1482357&view=diff
==============================================================================
--- accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex (original)
+++ accumulo/branches/1.5/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex Tue May 14 14:47:33 2013
@@ -1,10 +1,10 @@

% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements.  See the NOTICE file distributed with
+% contributor license agreements. See the NOTICE file distributed with