accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mwa...@apache.org
Subject [1/5] accumulo-website git commit: Organized documentation
Date Fri, 26 May 2017 14:17:26 GMT
Repository: accumulo-website
Updated Branches:
  refs/heads/asf-site 2c7f1e8cd -> 9ebc5f9a1
  refs/heads/master 29778dd0d -> 817a0ef72


Organized documentation

* Moved iterator_testing.md content to development_tools.md
* Moved proxy docs from client.md to new proxy.md
* Renamed analytics.md to mapreduce.md and moved combiner docs to
  iterators.md
* Reordered development docs


Project: http://git-wip-us.apache.org/repos/asf/accumulo-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo-website/commit/817a0ef7
Tree: http://git-wip-us.apache.org/repos/asf/accumulo-website/tree/817a0ef7
Diff: http://git-wip-us.apache.org/repos/asf/accumulo-website/diff/817a0ef7

Branch: refs/heads/master
Commit: 817a0ef7238c66bd48dcb184470998b3a1463b19
Parents: 29778dd
Author: Mike Walch <mwalch@apache.org>
Authored: Fri May 26 10:12:21 2017 -0400
Committer: Mike Walch <mwalch@apache.org>
Committed: Fri May 26 10:12:21 2017 -0400

----------------------------------------------------------------------
 _docs-unreleased/development/analytics.md       | 226 ----------
 .../development/development_tools.md            |  96 ++++-
 .../development/high_speed_ingest.md            |   2 +-
 _docs-unreleased/development/iterator_design.md | 386 -----------------
 .../development/iterator_testing.md             |  97 -----
 _docs-unreleased/development/iterators.md       | 419 +++++++++++++++++++
 _docs-unreleased/development/mapreduce.md       | 181 ++++++++
 _docs-unreleased/development/proxy.md           | 121 ++++++
 _docs-unreleased/development/sampling.md        |   2 +-
 _docs-unreleased/development/security.md        |   2 +-
 _docs-unreleased/development/summaries.md       |   2 +-
 _docs-unreleased/getting-started/clients.md     | 120 +-----
 12 files changed, 828 insertions(+), 826 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/analytics.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/analytics.md b/_docs-unreleased/development/analytics.md
deleted file mode 100644
index e579bf6..0000000
--- a/_docs-unreleased/development/analytics.md
+++ /dev/null
@@ -1,226 +0,0 @@
----
-title: Analytics
-category: development
-order: 8
----
-
-Accumulo supports more advanced data processing than simply keeping keys
-sorted and performing efficient lookups. Analytics can be developed by using
-MapReduce and Iterators in conjunction with Accumulo tables.
-
-## MapReduce
-
-Accumulo tables can be used as the source and destination of MapReduce jobs. To
-use an Accumulo table with a MapReduce job (specifically with the new Hadoop API
-as of version 0.20), configure the job parameters to use the AccumuloInputFormat
-and AccumuloOutputFormat. Accumulo specific parameters can be set via these
-two format classes to do the following:
-
-* Authenticate and provide user credentials for the input
-* Restrict the scan to a range of rows
-* Restrict the input to a subset of available columns
-
-### Mapper and Reducer classes
-
-To read from an Accumulo table create a Mapper with the following class
-parameterization and be sure to configure the AccumuloInputFormat.
-
-```java
-class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> {
-    public void map(Key k, Value v, Context c) {
-        // transform key and value data here
-    }
-}
-```
-
-To write to an Accumulo table, create a Reducer with the following class
-parameterization and be sure to configure the AccumuloOutputFormat. The key
-emitted from the Reducer identifies the table to which the mutation is sent. This
-allows a single Reducer to write to more than one table if desired. A default table
-can be configured using the AccumuloOutputFormat, in which case the output table
-name does not have to be passed to the Context object within the Reducer.
-
-```java
-class MyReducer extends Reducer<WritableComparable, Writable, Text, Mutation> {
-    public void reduce(WritableComparable key, Iterable<Text> values, Context c) {
-        Mutation m;
-        // create the mutation based on input key and value
-        c.write(new Text("output-table"), m);
-    }
-}
-```
-
-The Text object passed as the output should contain the name of the table to which
-this mutation should be applied. The Text can be null in which case the mutation
-will be applied to the default table name specified in the AccumuloOutputFormat
-options.
-
-### AccumuloInputFormat options
-
-```java
-Job job = new Job(getConf());
-AccumuloInputFormat.setInputInfo(job,
-        "user",
-        "passwd".getBytes(),
-        "table",
-        new Authorizations());
-
-AccumuloInputFormat.setZooKeeperInstance(job, "myinstance",
-        "zooserver-one,zooserver-two");
-```
-
-**Optional Settings:**
-
-To restrict Accumulo to a set of row ranges:
-
-```java
-ArrayList<Range> ranges = new ArrayList<Range>();
-// populate array list of row ranges ...
-AccumuloInputFormat.setRanges(job, ranges);
-```
-
-To restrict Accumulo to a list of columns:
-
-```java
-ArrayList<Pair<Text,Text>> columns = new ArrayList<Pair<Text,Text>>();
-// populate list of columns
-AccumuloInputFormat.fetchColumns(job, columns);
-```
-
-To use a regular expression to match row IDs:
-
-```java
-IteratorSetting is = new IteratorSetting(30, RexExFilter.class);
-RegExFilter.setRegexs(is, ".*suffix", null, null, null, true);
-AccumuloInputFormat.addIterator(job, is);
-```
-
-### AccumuloMultiTableInputFormat options
-
-The AccumuloMultiTableInputFormat allows the scanning over multiple tables
-in a single MapReduce job. Separate ranges, columns, and iterators can be
-used for each table.
-
-```java
-InputTableConfig tableOneConfig = new InputTableConfig();
-InputTableConfig tableTwoConfig = new InputTableConfig();
-```
-
-To set the configuration objects on the job:
-
-```java
-Map<String, InputTableConfig> configs = new HashMap<String,InputTableConfig>();
-configs.put("table1", tableOneConfig);
-configs.put("table2", tableTwoConfig);
-AccumuloMultiTableInputFormat.setInputTableConfigs(job, configs);
-```
-
-**Optional settings:**
-
-To restrict to a set of ranges:
-
-```java
-ArrayList<Range> tableOneRanges = new ArrayList<Range>();
-ArrayList<Range> tableTwoRanges = new ArrayList<Range>();
-// populate array lists of row ranges for tables...
-tableOneConfig.setRanges(tableOneRanges);
-tableTwoConfig.setRanges(tableTwoRanges);
-```
-
-To restrict Accumulo to a list of columns:
-
-```java
-ArrayList<Pair<Text,Text>> tableOneColumns = new ArrayList<Pair<Text,Text>>();
-ArrayList<Pair<Text,Text>> tableTwoColumns = new ArrayList<Pair<Text,Text>>();
-// populate lists of columns for each of the tables ...
-tableOneConfig.fetchColumns(tableOneColumns);
-tableTwoConfig.fetchColumns(tableTwoColumns);
-```
-
-To set scan iterators:
-
-```java
-List<IteratorSetting> tableOneIterators = new ArrayList<IteratorSetting>();
-List<IteratorSetting> tableTwoIterators = new ArrayList<IteratorSetting>();
-// populate the lists of iterator settings for each of the tables ...
-tableOneConfig.setIterators(tableOneIterators);
-tableTwoConfig.setIterators(tableTwoIterators);
-```
-
-The name of the table can be retrieved from the input split:
-
-```java
-class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> {
-    public void map(Key k, Value v, Context c) {
-        RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
-        String tableName = split.getTableName();
-        // do something with table name
-    }
-}
-```
-
-### AccumuloOutputFormat options
-
-```java
-boolean createTables = true;
-String defaultTable = "mytable";
-
-AccumuloOutputFormat.setOutputInfo(job,
-        "user",
-        "passwd".getBytes(),
-        createTables,
-        defaultTable);
-
-AccumuloOutputFormat.setZooKeeperInstance(job, "myinstance",
-        "zooserver-one,zooserver-two");
-```
-
-**Optional Settings:**
-
-```java
-AccumuloOutputFormat.setMaxLatency(job, 300000); // milliseconds
-AccumuloOutputFormat.setMaxMutationBufferSize(job, 50000000); // bytes
-```
-
-The [MapReduce example](https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md)
-contains a complete example of using MapReduce with Accumulo.
-
-## Combiners
-
-Many applications can benefit from the ability to aggregate values across common
-keys. This can be done via Combiner iterators and is similar to the Reduce step in
-MapReduce. This provides the ability to define online, incrementally updated
-analytics without the overhead or latency associated with batch-oriented
-MapReduce jobs.
-
-All that is needed to aggregate values of a table is to identify the fields over which
-values will be grouped, insert mutations with those fields as the key, and configure
-the table with a combining iterator that supports the summarizing operation
-desired.
-
-The only restriction on an combining iterator is that the combiner developer
-should not assume that all values for a given key have been seen, since new
-mutations can be inserted at anytime. This precludes using the total number of
-values in the aggregation such as when calculating an average, for example.
-
-### Feature Vectors
-
-An interesting use of combining iterators within an Accumulo table is to store
-feature vectors for use in machine learning algorithms. For example, many
-algorithms such as k-means clustering, support vector machines, anomaly detection,
-etc. use the concept of a feature vector and the calculation of distance metrics to
-learn a particular model. The columns in an Accumulo table can be used to efficiently
-store sparse features and their weights to be incrementally updated via the use of an
-combining iterator.
-
-## Statistical Modeling
-
-Statistical models that need to be updated by many machines in parallel could be
-similarly stored within an Accumulo table. For example, a MapReduce job that is
-iteratively updating a global statistical model could have each map or reduce worker
-reference the parts of the model to be read and updated through an embedded
-Accumulo client.
-
-Using Accumulo this way enables efficient and fast lookups and updates of small
-pieces of information in a random access pattern, which is complementary to
-MapReduce's sequential access model.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/development_tools.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/development_tools.md b/_docs-unreleased/development/development_tools.md
index 3e326e2..f9768f6 100644
--- a/_docs-unreleased/development/development_tools.md
+++ b/_docs-unreleased/development/development_tools.md
@@ -1,7 +1,7 @@
 ---
 title: Development Tools
 category: development
-order: 3
+order: 4
 ---
 
 Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of
@@ -9,6 +9,100 @@ Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc.
 you want to write a unit test that uses Accumulo, you need a lot of infrastructure
 in place before your test can run.
 
+## Iterator Test Harness
+
+Iterators, while extremely powerful, are notoriously difficult to test. While the API defines
+the methods an Iterator must implement and each method's functionality, the actual invocation
+of these methods by Accumulo TabletServers can be surprisingly difficult to mimic in unit tests.
+
+The Apache Accumulo "Iterator Test Harness" is designed to provide a generalized testing framework
+for all Accumulo Iterators to leverage to identify common pitfalls in user-created Iterators.
+
+### Framework Use
+
+The harness provides an abstract class for use with JUnit4. Users must define the following for this
+abstract class:
+
+  * A `SortedMap` of input data (`Key`-`Value` pairs)
+  * A `Range` to use in tests
+  * A `Map` of options (`String` to `String` pairs)
+  * A `SortedMap` of output data (`Key`-`Value` pairs)
+  * A list of `IteratorTestCase`s (these can be automatically discovered)
+
+The majority of effort a user must make is in creating the input dataset and the expected
+output dataset for the iterator being tested.
+
+### Normal Test Outline
+
+Most iterator tests will follow the given outline:
+
+```java
+import java.util.List;
+import java.util.SortedMap;
+
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.iteratortest.IteratorTestCaseFinder;
+import org.apache.accumulo.iteratortest.IteratorTestInput;
+import org.apache.accumulo.iteratortest.IteratorTestOutput;
+import org.apache.accumulo.iteratortest.junit4.BaseJUnit4IteratorTest;
+import org.apache.accumulo.iteratortest.testcases.IteratorTestCase;
+import org.junit.runners.Parameterized.Parameters;
+
+public class MyIteratorTest extends BaseJUnit4IteratorTest {
+
+  @Parameters
+  public static Object[][] parameters() {
+    final IteratorTestInput input = createIteratorInput();
+    final IteratorTestOutput output = createIteratorOutput();
+    final List<IteratorTestCase> testCases = IteratorTestCaseFinder.findAllTestCases();
+    return BaseJUnit4IteratorTest.createParameters(input, output, tests);
+  }
+
+  private static SortedMap<Key,Value> INPUT_DATA = createInputData();
+  private static SortedMap<Key,Value> OUTPUT_DATA = createOutputData();
+
+  private static SortedMap<Key,Value> createInputData() {
+    // TODO -- implement this method
+  }
+
+  private static SortedMap<Key,Value> createOutputData() {
+    // TODO -- implement this method
+  }
+
+  private static IteratorTestInput createIteratorInput() {
+    final Map<String,String> options = createIteratorOptions(); 
+    final Range range = createRange();
+    return new IteratorTestInput(MyIterator.class, options, range, INPUT_DATA);
+  }
+
+  private static Map<String,String> createIteratorOptions() {
+    // TODO -- implement this method
+    // Tip: Use INPUT_DATA if helpful in generating output
+  }
+
+  private static Range createRange() {
+    // TODO -- implement this method
+  }
+
+  private static IteratorTestOutput createIteratorOutput() {
+    return new IteratorTestOutput(OUTPUT_DATA);
+  }
+}
+```
+
+### Limitations
+
+While the provided `IteratorTestCase`s should exercise common edge-cases in user iterators,
+there are still many limitations to the existing test harness. Some of them are:
+
+  * Can only specify a single iterator, not many (a "stack")
+  * No control over provided IteratorEnvironment for tests
+  * Exercising delete keys (especially with major compactions that do not include all files)
+
+These are left as future improvements to the harness.
+
 ## Mock Accumulo
 
 Mock Accumulo supplies mock implementations for much of the client API. It presently

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/high_speed_ingest.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/high_speed_ingest.md b/_docs-unreleased/development/high_speed_ingest.md
index 7d906a0..f52f501 100644
--- a/_docs-unreleased/development/high_speed_ingest.md
+++ b/_docs-unreleased/development/high_speed_ingest.md
@@ -1,7 +1,7 @@
 ---
 title: High-Speed Ingest
 category: development
-order: 7
+order: 8
 ---
 
 Accumulo is often used as part of a larger data processing and storage system. To

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/iterator_design.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/iterator_design.md b/_docs-unreleased/development/iterator_design.md
deleted file mode 100644
index cfb46c8..0000000
--- a/_docs-unreleased/development/iterator_design.md
+++ /dev/null
@@ -1,386 +0,0 @@
----
-title: Iterator Design
-category: development
-order: 1
----
-
-Accumulo SortedKeyValueIterators, commonly referred to as Iterators for short, are server-side programming constructs
-that allow users to implement custom retrieval or computational purpose within Accumulo TabletServers.  The name rightly
-brings forward similarities to the Java Iterator interface; however, Accumulo Iterators are more complex than Java
-Iterators. Notably, in addition to the expected methods to retrieve the current element and advance to the next element
-in the iteration, Accumulo Iterators must also support the ability to "move" (`seek`) to an specified point in the
-iteration (the Accumulo table). Accumulo Iterators are designed to be concatenated together, similar to applying a
-series of transformations to a list of elements. Accumulo Iterators can duplicate their underlying source to create
-multiple "pointers" over the same underlying data (which is extremely powerful since each stream is sorted) or they can
-merge multiple Iterators into a single view. In this sense, a collection of Iterators operating in tandem is close to
-a tree-structure than a list, but there is always a sense of a flow of Key-Value pairs through some Iterators. Iterators
-are not designed to act as triggers nor are they designed to operate outside of the purview of a single table.
-
-Understanding how TabletServers invoke the methods on a SortedKeyValueIterator can be obtuse as the actual code is
-buried within the implementation of the TabletServer; however, it is generally unnecessary to have a strong
-understanding of this as the interface provides clear definitions about what each action each method should take. This
-chapter aims to provide a more detailed description of how Iterators are invoked, some best practices and some common
-pitfalls.
-
-## Instantiation
-
-To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class must be on the classpath of every
-TabletServer. For production environments, it is common to place a JAR file which contains the Iterator in
-`lib/`.  In development environments, it is convenient to instead place the JAR file in `lib/ext/` as JAR files
-in this directory are dynamically reloaded by the TabletServers alleviating the need to restart Accumulo while
-testing an Iterator. Advanced classloader features which enable other types of filesystems and per-table classpath
-configurations (as opposed to process-wide classpaths). These features are not covered here, but elsewhere in the user
-manual.
-
-Accumulo references the Iterator class by name and uses Java reflection to instantiate the Iterator. This means that
-Iterators must have a public no-args constructor.
-
-## Interface
-
-A normal implementation of the SortedKeyValueIterator defines functionality for the following methods:
-
-```java
-void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException;
-
-boolean hasTop();
-
-void next() throws IOException;
-
-void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException;
-
-Key getTopKey();
-
-Value getTopValue();
-
-SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env);
-```
-
-### init
-
-The `init` method is called by the TabletServer after it constructs an instance of the Iterator.  This method should
-clear/reset any internal state in the Iterator and prepare it to process data.  The first argument, the `source`, is the
-Iterator "below" this Iterator (where the client is at "top" and the Iterator for files in HDFS are at the "bottom").
-The "source" Iterator provides the Key-Value pairs which this Iterator will operate upon.
-
-The second argument, a Map of options, is made up of options provided by the user, options set in the table's
-configuration, and/or options set in the containing namespace's configuration.
-These options allow for Iterators to dynamically configure themselves on the fly. If no options are used in the current context
-(a Scan or Compaction), the Map will be empty. An example of a configuration item for an Iterator could be a pattern used to filter
-Key-Value pairs in a regular expression Iterator.
-
-The third argument, the `IteratorEnvironment`, is a special object which provides information to this Iterator about the
-context in which it was invoked. Commonly, this information is not necessary to inspect. For example, if an Iterator
-knows that it is running in the context of a full-major compaction (reading all of the data) as opposed to a user scan
-(which may strongly limit the number of columns), the Iterator might make different algorithmic decisions in an attempt to
-optimize itself.
-
-### seek
-
-The `seek` method is likely the most confusing method on the Iterator interface. The purpose of this method is to
-advance the stream of Key-Value pairs to a certain point in the iteration (the Accumulo table). It is common that before
-the implementation of this method returns some additional processing is performed which may further advance the current
-position past the `startKey` of the `Range`. This, however, is dependent on the functionality the iterator provides. For
-example, a filtering iterator would consume a number Key-Value pairs which do not meets its criteria before `seek`
-returns. The important condition for `seek` to meet is that this Iterator should be ready to return the first Key-Value
-pair, or none if no such pair is available, when the method returns. The Key-Value pair would be returned by `getTopKey`
-and `getTopValue`, respectively, and `hasTop` should return a boolean denoting whether or not there is
-a Key-Value pair to return.
-
-The arguments passed to seek are as follows:
-
-The TabletServer first provides a `Range`, an object which defines some collection of Accumulo `Key`s, which defines the
-Key-Value pairs that this Iterator should return. Each `Range` has a `startKey` and `endKey` with an inclusive flag for
-both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not
-guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on
-Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s)
-provided to `seek` and Iterators should not return any Keys that fall outside of the provided Range.
-
-The second argument, a `Collection<ByteSequence>`, is the set of column families which should be retained or
-excluded by this Iterator. The third argument, a boolean, defines whether the collection of column families
-should be treated as an inclusion collection (true) or an exclusion collection (false).
-
-It is likely that all implementations of `seek` will first make a call to the `seek` method on the
-"source" Iterator that was provided in the `init` method. The collection of column families and
-the boolean `include` argument should be passed down as well as the `Range`. Somewhat commonly, the Iterator will
-also implement some sort of additional logic to find or compute the first Key-Value pair in the provided
-Range. For example, a regular expression Iterator would consume all records which do not match the given
-pattern before returning from `seek`.
-
-It is important to retain the original Range passed to this method to know when this Iterator should stop
-reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it
-will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet.
-Best practice is to never emit entries outside the seek range.
-
-### next
-
-The `next` method is analogous to the `next` method on a Java Iterator: this method should advance
-the Iterator to the next Key-Value pair. For implementations that perform some filtering or complex
-logic, this may result in more than one Key-Value pair being inspected. This method alters
-some internal state that is exposed via the `hasTop`, `getTopKey`, and `getTopValue` methods.
-
-The result of this method is commonly caching a Key-Value pair which `getTopKey` and `getTopValue`
-can later return. While there is another Key-Value pair to return, `hasTop` should return true.
-If there are no more Key-Value pairs to return from this Iterator since the last call to
-`seek`, `hasTop` should return false.
-
-### hasTop
-
-The `hasTop` method is similar to the `hasNext` method on a Java Iterator in that it informs
-the caller if there is a Key-Value pair to be returned. If there is no pair to return, this method
-should return false. Like a Java Iterator, multiple calls to `hasTop` (without calling `next`) should not
-alter the internal state of the Iterator.
-
-### getTopKey and getTopValue
-
-These methods simply return the current Key-Value pair for this iterator. If `hasTop` returns true,
-both of these methods should return non-null objects. If `hasTop` returns false, it is undefined
-what these methods should return. Like `hasTop`, multiple calls to these methods should not alter
-the state of the Iterator.
-
-Users should take caution when either
-
-1. caching the Key/Value from `getTopKey`/`getTopValue`, for use after calling `next` on the source iterator.
-In this case, the cached Key/Value object is aliased to the reference returned by the source iterator.
-Iterators may reuse the same Key/Value object in a `next` call for performance reasons, changing the data
-that the cached Key/Value object references and resulting in a logic bug.
-2. modifying the Key/Value from `getTopKey`/`getTopValue`. If the source iterator reuses data stored in the Key/Value,
-then the source iterator may use the modified data that the Key/Value references. This may/may not result in a logic bug.
-
-In both cases, copying the Key/Value's data into a new object ensures iterator correctness. If neither case applies,
-it is safe to not copy the Key/Value.  The general guideline is to be aware of who else may use Key/Value objects
-returned from `getTopKey`/`getTopValue`.
-
-### deepCopy
-
-The `deepCopy` method is similar to the `clone` method from the Java `Cloneable` interface.
-Implementations of this method should return a new object of the same type as the Accumulo Iterator
-instance it was called on. Any internal state from the instance `deepCopy` was called
-on should be carried over to the returned copy. The returned copy should be ready to have
-`seek` called on it. The SortedKeyValueIterator interface guarantees that `init` will be called on
-an iterator before `deepCopy` and that `init` will not be called on the iterator returned by
-`deepCopy`.
-
-Typically, implementations of `deepCopy` call a copy-constructor which will initialize
-internal data structures. As with `seek`, it is common for the `IteratorEnvironment`
-argument to be ignored as most Iterator implementations can be written without the explicit
-information the environment provides.
-
-In the analogy of a series of Iterators representing a tree, `deepCopy` can be thought of as
-early programming assignments which implement their own tree data structures. `deepCopy` calls
-copy on its sources (the children), copies itself, attaches the copies of the children, and
-then returns itself.
-
-## TabletServer invocation of Iterators
-
-The following code is a general outline for how TabletServers invoke Iterators.
-
-```java
-List<KeyValue> batch;
-Range range = getRangeFromClient();
-while(!overSizeLimit(batch)){
- SortedKeyValueIterator source = getSystemIterator();
-
- for(String clzName : getUserIterators()){
-  Class<?> clz = Class.forName(clzName);
-  SortedKeyValueIterator iter = (SortedKeyValueIterator) clz.newInstance();
-  iter.init(source, opts, env);
-  source = iter;
- }
-
- // read a batch of data to return to client
- // the last iterator, the "top"
- SortedKeyValueIterator topIter = source;
- topIter.seek(getRangeFromUser(), ...)
-
- while(topIter.hasTop() && !overSizeLimit(batch)){
-   key = topIter.getTopKey()
-   val = topIter.getTopValue()
-   batch.add(new KeyValue(key, val)
-   if(systemDataSourcesChanged()){
-     // code does not show isolation case, which will
-     // keep using same data sources until a row boundry is hit 
-     range = new Range(key, false, range.endKey(), range.endKeyInclusive());
-     break;
-   }
- }
-}
-//return batch of key values to client
-```
-
-Additionally, the obtuse "re-seek" case can be outlined as the following:
-
-```java
-// Given the above
-List<KeyValue> batch = getNextBatch();
-
-// Store off lastKeyReturned for this client
-lastKeyReturned = batch.get(batch.size() - 1).getKey();
-
-// thread goes away (client stops asking for the next batch).
-
-// Eventually client comes back
-// Setup as before...
-
-Range userRange = getRangeFromUser();
-Range actualRange = new Range(lastKeyReturned, false
-    userRange.getEndKey(), userRange.isEndKeyInclusive());
-
-// Use the actualRange, not the user provided one
-topIter.seek(actualRange);
-```
-
-## Isolation
-
-Accumulo provides a feature which clients can enable to prevent the viewing of partially
-applied mutations within the context of rows. If a client is submitting multiple column
-updates to rows at a time, isolation would ensure that a client would either see all of
-updates made to that row or none of the updates (until they are all applied).
-
-When using Isolation, there are additional concerns in iterator design. A scan time iterator in accumulo
-reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a
-key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources
-may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user enables
-Isolation, this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the
-next possible row.
-
-## Abstract Iterators
-
-A number of Abstract implementations of Iterators are provided to allow for faster creation
-of common patterns. The most commonly used abstract implementations are the `Filter` and
-`Combiner` classes. When possible these classes should be used instead as they have been
-thoroughly tested inside Accumulo itself.
-
-### Filter
-
-The `Filter` abstract Iterator provides a very simple implementation which allows implementations
-to define whether or not a Key-Value pair should be returned via an `accept(Key, Value)` method.
-
-Filters are extremely simple to implement; however, when the implementation is filtering a
-large percentage of Key-Value pairs with respect to the total number of pairs examined,
-it can be very inefficient. For example, if a Filter implementation can determine after examining
-part of the row that no other pairs in this row will be accepted, there is no mechanism to
-efficiently skip the remaining Key-Value pairs. Concretely, take a row which is comprised of
-1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is determined
-that no other Key-Value pairs in this row will be accepted. The Filter must still examine each
-remaining 990 Key-Value pairs in this row. Another way to express this deficiency is that
-Filters have no means to leverage the `seek` method to efficiently skip large portions
-of Key-Value pairs.
-
-As such, the `Filter` class functions well for filtering small amounts of data, but is
-inefficient for filtering large amounts of data. The decision to use a `Filter` strongly
-depends on the use case and distribution of data being filtered.
-
-### Combiner
-
-The `Combiner` class is another common abstract Iterator. Similar to the `Combiner` interface
-define in Hadoop's MapReduce framework, implementations of this abstract class reduce
-multiple Values for different versions of a Key (Keys which only differ by timestamps) into one Key-Value pair.
-Combiners provide a simple way to implement common operations like summation and
-aggregation without the need to implement the entire Accumulo Iterator interface.
-
-One important consideration when choosing to design a Combiner is that the "reduction" operation
-is often best represented when it is associative and commutative. Operations which do not meet
-these criteria can be implemented; however, the implementation can be difficult.
-
-A second consideration is that a Combiner is not guaranteed to see every Key-Value pair
-which differ only by timestamp every time it is invoked. For example, if there are 5 Key-Value
-pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is not guaranteed that
-every invocation of the Combiner will see 5 timestamps. One invocation might see the Values for
-Keys with timestamp 1 and 4, while another invocation might see the Values for Keys with the
-timestamps 1, 2, 4 and 5.
-
-Finally, when configuring an Accumulo table to use a Combiner, be sure to disable the Versioning Iterator or set the
-Combiner at a priority less than the Combiner (the Versioning Iterator is added at a priority of 20 by default). The
-Versioning Iterator will filter out multiple Key-Value pairs that differ only by timestamp and return only the Key-Value
-pair that has the largest timestamp.
-
-## Best practices
-
-Because of the flexibility that the `SortedKeyValueInterface` provides, it doesn't directly disallow
-many implementations which are poor design decisions. The following are some common recommendations to
-follow and pitfalls to avoid in Iterator implementations.
-
-#### Avoid special logic encoded in Ranges
-
-Commonly, granular Ranges that a client passes to an Iterator from a `Scanner` or `BatchScanner` are unmodified.
-If a `Range` falls within the boundaries of a Tablet, an Iterator will often see that same Range in the
-`seek` method. However, there is no guarantee that the `Range` will remain unaltered from client to server. As such, Iterators
-should *never* make assumptions about the current state/context based on the `Range`.
-
-The common failure condition is referred to as a "re-seek". In the context of a Scan, TabletServers construct the
-"stack" of Iterators and batch up Key-Value pairs to send back to the client. When a sufficient number of Key-Value
-pairs are collected, it is common for the Iterators to be "torn down" until the client asks for the next batch of
-Key-Value pairs. This is done by the TabletServer to add fairness in ensuring one Scan does not monopolize the available
-resources. When the client asks for the next batch, the implementation modifies the original Range so that servers know
-the point to resume the iteration (to avoid returning duplicate Key-Value pairs). Specifically, the new Range is created
-from the original but is shortened by setting the startKey of the original Range to the Key last returned by the Scan,
-non-inclusive.
-
-### `seek`'ing backwards
-
-The ability for an Iterator to "skip over" large blocks of Key-Value pairs is a major tenet behind Iterators.
-By `seek`'ing when it is known that there is a collection of Key-Value pairs which can be ignored can
-greatly increase the speed of a scan as many Key-Value pairs do not have to be deserialized and processed.
-
-While the `seek` method provides the `Range` that should be used to `seek` the underlying source Iterator,
-there is no guarantee that the implementing Iterator uses that `Range` to perform the `seek` on its
-"source" Iterator. As such, it is possible to seek to any `Range` and the interface has no assertions
-to prevent this from happening.
-
-Since Iterators are allowed to `seek` to arbitrary Keys, it also allows Iterators to create infinite loops
-inside Scans that will repeatedly read the same data without end. If an arbitrary Range is constructed, it should
-construct a completely new Range as it allows for bugs to be introduced which will break Accumulo.
-
-Thus, `seek`'s should always be thought of as making "forward progress" in the view of the total iteration. The
-`startKey` of a `Range` should always be greater than the current Key seen by the Iterator while the `endKey` of the
-`Range` should always retain the original `endKey` (and `endKey` inclusivity) of the last `Range` seen by your
-Iterator's implementation of seek.
-
-### Take caution in constructing new data in an Iterator
-
-Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means
-to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator
-is *not* managed in such a way that guarantees that this is safe nor efficient. Specifically, there
-is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s)
-are reaped) without calling the close() method. `close`'ing and recreating a `BatchWriter` after every
-Key-Value pair is also prohibitively performance limiting to be considered an option.
-
-The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair.
-For example, the `WholeRowIterator` serializes the all of the Key-Values pairs that fall within each
-row. A safe way to generate more data in an Iterator would be to construct an Iterator that is
-"higher" (at a larger priority) than the `WholeRowIterator`, that is, the Iterator receives the Key-Value pairs which are
-a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute
-some function, and add a new Key-Value pair to the original collection, re-serializing the collection
-of Key-Value pairs back into a single Key-Value pair.
-
-Any other situation is likely not guaranteed to ensure that the caller (a Scan or a Compaction) will
-always see all intended data that is generated.
-
-## Final things to remember
-
-Some simple recommendations/points to keep in mind:
-
-### Method call order
-
-On an instance of an Iterator: `init` is always called before `seek`, `seek` is always called before `hasTop`,
-`getTopKey` and `getTopValue` will not be called if `hasTop` returns false.
-
-### Teardown
-
-As mentioned, instance of Iterators may be torn down inside of the server transparently. When a complex
-collection of iterators is performing some advanced functionality, they will not be torn down until a Key-Value
-pair is returned out of the "stack" of Iterators (and added into the batch of Key-Values to be returned
-to the caller). Being torn-down is equivalent to a new instance of the Iterator being creating and `deepCopy`
-being called on the new instance with the old instance provided as the argument to `deepCopy`. References
-to the old instance are removed and the object is lazily garbage collected by the JVM.
-
-## Compaction-time Iterators
-
-When Iterators are configured to run during compactions, at the `minc` or `majc` scope, these Iterators sometimes need
-to make different assertions than those who only operate at scan time. Iterators won't see the delete entries; however,
-Iterators will not necessarily see all of the Key-Value pairs in ever invocation. Because compactions often do not rewrite
-all files (only a subset of them), it is possible that the logic take this into consideration.
-
-For example, a Combiner that runs over data at during compactions, might not see all of the values for a given Key. The
-Combiner must recognize this and not perform any function that would be incorrect due
-to the missing values.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/iterator_testing.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/iterator_testing.md b/_docs-unreleased/development/iterator_testing.md
deleted file mode 100644
index a0e82de..0000000
--- a/_docs-unreleased/development/iterator_testing.md
+++ /dev/null
@@ -1,97 +0,0 @@
----
-title: Iterator Testing
-category: development
-order: 2
----
-
-Iterators, while extremely powerful, are notoriously difficult to test. While the API defines
-the methods an Iterator must implement and each method's functionality, the actual invocation
-of these methods by Accumulo TabletServers can be surprisingly difficult to mimic in unit tests.
-
-The Apache Accumulo "Iterator Test Harness" is designed to provide a generalized testing framework
-for all Accumulo Iterators to leverage to identify common pitfalls in user-created Iterators.
-
-## Framework Use
-
-The harness provides an abstract class for use with JUnit4. Users must define the following for this
-abstract class:
-
-  * A `SortedMap` of input data (`Key`-`Value` pairs)
-  * A `Range` to use in tests
-  * A `Map` of options (`String` to `String` pairs)
-  * A `SortedMap` of output data (`Key`-`Value` pairs)
-  * A list of `IteratorTestCase`s (these can be automatically discovered)
-
-The majority of effort a user must make is in creating the input dataset and the expected
-output dataset for the iterator being tested.
-
-## Normal Test Outline
-
-Most iterator tests will follow the given outline:
-
-```java
-import java.util.List;
-import java.util.SortedMap;
-
-import org.apache.accumulo.core.data.Key;
-import org.apache.accumulo.core.data.Range;
-import org.apache.accumulo.core.data.Value;
-import org.apache.accumulo.iteratortest.IteratorTestCaseFinder;
-import org.apache.accumulo.iteratortest.IteratorTestInput;
-import org.apache.accumulo.iteratortest.IteratorTestOutput;
-import org.apache.accumulo.iteratortest.junit4.BaseJUnit4IteratorTest;
-import org.apache.accumulo.iteratortest.testcases.IteratorTestCase;
-import org.junit.runners.Parameterized.Parameters;
-
-public class MyIteratorTest extends BaseJUnit4IteratorTest {
-
-  @Parameters
-  public static Object[][] parameters() {
-    final IteratorTestInput input = createIteratorInput();
-    final IteratorTestOutput output = createIteratorOutput();
-    final List<IteratorTestCase> testCases = IteratorTestCaseFinder.findAllTestCases();
-    return BaseJUnit4IteratorTest.createParameters(input, output, tests);
-  }
-
-  private static SortedMap<Key,Value> INPUT_DATA = createInputData();
-  private static SortedMap<Key,Value> OUTPUT_DATA = createOutputData();
-
-  private static SortedMap<Key,Value> createInputData() {
-    // TODO -- implement this method
-  }
-
-  private static SortedMap<Key,Value> createOutputData() {
-    // TODO -- implement this method
-  }
-
-  private static IteratorTestInput createIteratorInput() {
-    final Map<String,String> options = createIteratorOptions(); 
-    final Range range = createRange();
-    return new IteratorTestInput(MyIterator.class, options, range, INPUT_DATA);
-  }
-
-  private static Map<String,String> createIteratorOptions() {
-    // TODO -- implement this method
-    // Tip: Use INPUT_DATA if helpful in generating output
-  }
-
-  private static Range createRange() {
-    // TODO -- implement this method
-  }
-
-  private static IteratorTestOutput createIteratorOutput() {
-    return new IteratorTestOutput(OUTPUT_DATA);
-  }
-}
-```
-
-## Limitations
-
-While the provided `IteratorTestCase`s should exercise common edge-cases in user iterators,
-there are still many limitations to the existing test harness. Some of them are:
-
-  * Can only specify a single iterator, not many (a "stack")
-  * No control over provided IteratorEnvironment for tests
-  * Exercising delete keys (especially with major compactions that do not include all files)
-
-These are left as future improvements to the harness.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/iterators.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/iterators.md b/_docs-unreleased/development/iterators.md
new file mode 100644
index 0000000..947d5e0
--- /dev/null
+++ b/_docs-unreleased/development/iterators.md
@@ -0,0 +1,419 @@
+---
+title: Iterators
+category: development
+order: 1
+---
+
+Accumulo SortedKeyValueIterators, commonly referred to as **Iterators** for short, are server-side programming constructs
+that allow users to implement custom retrieval or computational purpose within Accumulo TabletServers.  The name rightly
+brings forward similarities to the Java Iterator interface; however, Accumulo Iterators are more complex than Java
+Iterators. Notably, in addition to the expected methods to retrieve the current element and advance to the next element
+in the iteration, Accumulo Iterators must also support the ability to "move" (`seek`) to an specified point in the
+iteration (the Accumulo table). Accumulo Iterators are designed to be concatenated together, similar to applying a
+series of transformations to a list of elements. Accumulo Iterators can duplicate their underlying source to create
+multiple "pointers" over the same underlying data (which is extremely powerful since each stream is sorted) or they can
+merge multiple Iterators into a single view. In this sense, a collection of Iterators operating in tandem is close to
+a tree-structure than a list, but there is always a sense of a flow of Key-Value pairs through some Iterators. Iterators
+are not designed to act as triggers nor are they designed to operate outside of the purview of a single table.
+
+Understanding how TabletServers invoke the methods on a SortedKeyValueIterator can be obtuse as the actual code is
+buried within the implementation of the TabletServer; however, it is generally unnecessary to have a strong
+understanding of this as the interface provides clear definitions about what each action each method should take. This
+chapter aims to provide a more detailed description of how Iterators are invoked, some best practices and some common
+pitfalls.
+
+## Instantiation
+
+To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class must be on the classpath of every
+TabletServer. For production environments, it is common to place a JAR file which contains the Iterator in
+`lib/`.  In development environments, it is convenient to instead place the JAR file in `lib/ext/` as JAR files
+in this directory are dynamically reloaded by the TabletServers alleviating the need to restart Accumulo while
+testing an Iterator. Advanced classloader features which enable other types of filesystems and per-table classpath
+configurations (as opposed to process-wide classpaths). These features are not covered here, but elsewhere in the user
+manual.
+
+Accumulo references the Iterator class by name and uses Java reflection to instantiate the Iterator. This means that
+Iterators must have a public no-args constructor.
+
+## Interface
+
+A normal implementation of the SortedKeyValueIterator defines functionality for the following methods:
+
+```java
+void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException;
+
+boolean hasTop();
+
+void next() throws IOException;
+
+void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException;
+
+Key getTopKey();
+
+Value getTopValue();
+
+SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env);
+```
+
+### init
+
+The `init` method is called by the TabletServer after it constructs an instance of the Iterator.  This method should
+clear/reset any internal state in the Iterator and prepare it to process data.  The first argument, the `source`, is the
+Iterator "below" this Iterator (where the client is at "top" and the Iterator for files in HDFS are at the "bottom").
+The "source" Iterator provides the Key-Value pairs which this Iterator will operate upon.
+
+The second argument, a Map of options, is made up of options provided by the user, options set in the table's
+configuration, and/or options set in the containing namespace's configuration.
+These options allow for Iterators to dynamically configure themselves on the fly. If no options are used in the current context
+(a Scan or Compaction), the Map will be empty. An example of a configuration item for an Iterator could be a pattern used to filter
+Key-Value pairs in a regular expression Iterator.
+
+The third argument, the `IteratorEnvironment`, is a special object which provides information to this Iterator about the
+context in which it was invoked. Commonly, this information is not necessary to inspect. For example, if an Iterator
+knows that it is running in the context of a full-major compaction (reading all of the data) as opposed to a user scan
+(which may strongly limit the number of columns), the Iterator might make different algorithmic decisions in an attempt to
+optimize itself.
+
+### seek
+
+The `seek` method is likely the most confusing method on the Iterator interface. The purpose of this method is to
+advance the stream of Key-Value pairs to a certain point in the iteration (the Accumulo table). It is common that before
+the implementation of this method returns some additional processing is performed which may further advance the current
+position past the `startKey` of the `Range`. This, however, is dependent on the functionality the iterator provides. For
+example, a filtering iterator would consume a number Key-Value pairs which do not meets its criteria before `seek`
+returns. The important condition for `seek` to meet is that this Iterator should be ready to return the first Key-Value
+pair, or none if no such pair is available, when the method returns. The Key-Value pair would be returned by `getTopKey`
+and `getTopValue`, respectively, and `hasTop` should return a boolean denoting whether or not there is
+a Key-Value pair to return.
+
+The arguments passed to seek are as follows:
+
+The TabletServer first provides a `Range`, an object which defines some collection of Accumulo `Key`s, which defines the
+Key-Value pairs that this Iterator should return. Each `Range` has a `startKey` and `endKey` with an inclusive flag for
+both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not
+guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on
+Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s)
+provided to `seek` and Iterators should not return any Keys that fall outside of the provided Range.
+
+The second argument, a `Collection<ByteSequence>`, is the set of column families which should be retained or
+excluded by this Iterator. The third argument, a boolean, defines whether the collection of column families
+should be treated as an inclusion collection (true) or an exclusion collection (false).
+
+It is likely that all implementations of `seek` will first make a call to the `seek` method on the
+"source" Iterator that was provided in the `init` method. The collection of column families and
+the boolean `include` argument should be passed down as well as the `Range`. Somewhat commonly, the Iterator will
+also implement some sort of additional logic to find or compute the first Key-Value pair in the provided
+Range. For example, a regular expression Iterator would consume all records which do not match the given
+pattern before returning from `seek`.
+
+It is important to retain the original Range passed to this method to know when this Iterator should stop
+reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it
+will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet.
+Best practice is to never emit entries outside the seek range.
+
+### next
+
+The `next` method is analogous to the `next` method on a Java Iterator: this method should advance
+the Iterator to the next Key-Value pair. For implementations that perform some filtering or complex
+logic, this may result in more than one Key-Value pair being inspected. This method alters
+some internal state that is exposed via the `hasTop`, `getTopKey`, and `getTopValue` methods.
+
+The result of this method is commonly caching a Key-Value pair which `getTopKey` and `getTopValue`
+can later return. While there is another Key-Value pair to return, `hasTop` should return true.
+If there are no more Key-Value pairs to return from this Iterator since the last call to
+`seek`, `hasTop` should return false.
+
+### hasTop
+
+The `hasTop` method is similar to the `hasNext` method on a Java Iterator in that it informs
+the caller if there is a Key-Value pair to be returned. If there is no pair to return, this method
+should return false. Like a Java Iterator, multiple calls to `hasTop` (without calling `next`) should not
+alter the internal state of the Iterator.
+
+### getTopKey and getTopValue
+
+These methods simply return the current Key-Value pair for this iterator. If `hasTop` returns true,
+both of these methods should return non-null objects. If `hasTop` returns false, it is undefined
+what these methods should return. Like `hasTop`, multiple calls to these methods should not alter
+the state of the Iterator.
+
+Users should take caution when either
+
+1. caching the Key/Value from `getTopKey`/`getTopValue`, for use after calling `next` on the source iterator.
+In this case, the cached Key/Value object is aliased to the reference returned by the source iterator.
+Iterators may reuse the same Key/Value object in a `next` call for performance reasons, changing the data
+that the cached Key/Value object references and resulting in a logic bug.
+2. modifying the Key/Value from `getTopKey`/`getTopValue`. If the source iterator reuses data stored in the Key/Value,
+then the source iterator may use the modified data that the Key/Value references. This may/may not result in a logic bug.
+
+In both cases, copying the Key/Value's data into a new object ensures iterator correctness. If neither case applies,
+it is safe to not copy the Key/Value.  The general guideline is to be aware of who else may use Key/Value objects
+returned from `getTopKey`/`getTopValue`.
+
+### deepCopy
+
+The `deepCopy` method is similar to the `clone` method from the Java `Cloneable` interface.
+Implementations of this method should return a new object of the same type as the Accumulo Iterator
+instance it was called on. Any internal state from the instance `deepCopy` was called
+on should be carried over to the returned copy. The returned copy should be ready to have
+`seek` called on it. The SortedKeyValueIterator interface guarantees that `init` will be called on
+an iterator before `deepCopy` and that `init` will not be called on the iterator returned by
+`deepCopy`.
+
+Typically, implementations of `deepCopy` call a copy-constructor which will initialize
+internal data structures. As with `seek`, it is common for the `IteratorEnvironment`
+argument to be ignored as most Iterator implementations can be written without the explicit
+information the environment provides.
+
+In the analogy of a series of Iterators representing a tree, `deepCopy` can be thought of as
+early programming assignments which implement their own tree data structures. `deepCopy` calls
+copy on its sources (the children), copies itself, attaches the copies of the children, and
+then returns itself.
+
+## TabletServer invocation of Iterators
+
+The following code is a general outline for how TabletServers invoke Iterators.
+
+```java
+List<KeyValue> batch;
+Range range = getRangeFromClient();
+while(!overSizeLimit(batch)){
+ SortedKeyValueIterator source = getSystemIterator();
+
+ for(String clzName : getUserIterators()){
+  Class<?> clz = Class.forName(clzName);
+  SortedKeyValueIterator iter = (SortedKeyValueIterator) clz.newInstance();
+  iter.init(source, opts, env);
+  source = iter;
+ }
+
+ // read a batch of data to return to client
+ // the last iterator, the "top"
+ SortedKeyValueIterator topIter = source;
+ topIter.seek(getRangeFromUser(), ...)
+
+ while(topIter.hasTop() && !overSizeLimit(batch)){
+   key = topIter.getTopKey()
+   val = topIter.getTopValue()
+   batch.add(new KeyValue(key, val)
+   if(systemDataSourcesChanged()){
+     // code does not show isolation case, which will
+     // keep using same data sources until a row boundry is hit 
+     range = new Range(key, false, range.endKey(), range.endKeyInclusive());
+     break;
+   }
+ }
+}
+//return batch of key values to client
+```
+
+Additionally, the obtuse "re-seek" case can be outlined as the following:
+
+```java
+// Given the above
+List<KeyValue> batch = getNextBatch();
+
+// Store off lastKeyReturned for this client
+lastKeyReturned = batch.get(batch.size() - 1).getKey();
+
+// thread goes away (client stops asking for the next batch).
+
+// Eventually client comes back
+// Setup as before...
+
+Range userRange = getRangeFromUser();
+Range actualRange = new Range(lastKeyReturned, false
+    userRange.getEndKey(), userRange.isEndKeyInclusive());
+
+// Use the actualRange, not the user provided one
+topIter.seek(actualRange);
+```
+
+## Isolation
+
+Accumulo provides a feature which clients can enable to prevent the viewing of partially
+applied mutations within the context of rows. If a client is submitting multiple column
+updates to rows at a time, isolation would ensure that a client would either see all of
+updates made to that row or none of the updates (until they are all applied).
+
+When using Isolation, there are additional concerns in iterator design. A scan time iterator in accumulo
+reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a
+key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources
+may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user enables
+Isolation, this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the
+next possible row.
+
+## Abstract Iterators
+
+A number of Abstract implementations of Iterators are provided to allow for faster creation
+of common patterns. The most commonly used abstract implementations are the `Filter` and
+`Combiner` classes. When possible these classes should be used instead as they have been
+thoroughly tested inside Accumulo itself.
+
+### Filter
+
+The `Filter` abstract Iterator provides a very simple implementation which allows implementations
+to define whether or not a Key-Value pair should be returned via an `accept(Key, Value)` method.
+
+Filters are extremely simple to implement; however, when the implementation is filtering a
+large percentage of Key-Value pairs with respect to the total number of pairs examined,
+it can be very inefficient. For example, if a Filter implementation can determine after examining
+part of the row that no other pairs in this row will be accepted, there is no mechanism to
+efficiently skip the remaining Key-Value pairs. Concretely, take a row which is comprised of
+1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is determined
+that no other Key-Value pairs in this row will be accepted. The Filter must still examine each
+remaining 990 Key-Value pairs in this row. Another way to express this deficiency is that
+Filters have no means to leverage the `seek` method to efficiently skip large portions
+of Key-Value pairs.
+
+As such, the `Filter` class functions well for filtering small amounts of data, but is
+inefficient for filtering large amounts of data. The decision to use a `Filter` strongly
+depends on the use case and distribution of data being filtered.
+
+### Combiner
+
+The `Combiner` class is another common abstract Iterator. Similar to the `Combiner` interface
+define in Hadoop's MapReduce framework, implementations of this abstract class reduce
+multiple Values for different versions of a Key (Keys which only differ by timestamps) into one Key-Value pair.
+Combiners provide a simple way to implement common operations like summation and
+aggregation without the need to implement the entire Accumulo Iterator interface.
+
+One important consideration when choosing to design a Combiner is that the "reduction" operation
+is often best represented when it is associative and commutative. Operations which do not meet
+these criteria can be implemented; however, the implementation can be difficult.
+
+A second consideration is that a Combiner is not guaranteed to see every Key-Value pair
+which differ only by timestamp every time it is invoked. For example, if there are 5 Key-Value
+pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is not guaranteed that
+every invocation of the Combiner will see 5 timestamps. One invocation might see the Values for
+Keys with timestamp 1 and 4, while another invocation might see the Values for Keys with the
+timestamps 1, 2, 4 and 5.
+
+Finally, when configuring an Accumulo table to use a Combiner, be sure to disable the Versioning Iterator or set the
+Combiner at a priority less than the Combiner (the Versioning Iterator is added at a priority of 20 by default). The
+Versioning Iterator will filter out multiple Key-Value pairs that differ only by timestamp and return only the Key-Value
+pair that has the largest timestamp.
+
+#### Combiner Applications
+
+Many applications can benefit from the ability to aggregate values across common
+keys. This can be done via Combiner iterators and is similar to the Reduce step in
+MapReduce. This provides the ability to define online, incrementally updated
+analytics without the overhead or latency associated with batch-oriented
+MapReduce jobs.
+
+All that is needed to aggregate values of a table is to identify the fields over which
+values will be grouped, insert mutations with those fields as the key, and configure
+the table with a combining iterator that supports the summarizing operation
+desired.
+
+The only restriction on an combining iterator is that the combiner developer
+should not assume that all values for a given key have been seen, since new
+mutations can be inserted at anytime. This precludes using the total number of
+values in the aggregation such as when calculating an average, for example.
+
+An interesting use of combining iterators within an Accumulo table is to store
+feature vectors for use in machine learning algorithms. For example, many
+algorithms such as k-means clustering, support vector machines, anomaly detection,
+etc. use the concept of a feature vector and the calculation of distance metrics to
+learn a particular model. The columns in an Accumulo table can be used to efficiently
+store sparse features and their weights to be incrementally updated via the use of an
+combining iterator.
+
+## Best practices
+
+Because of the flexibility that the `SortedKeyValueInterface` provides, it doesn't directly disallow
+many implementations which are poor design decisions. The following are some common recommendations to
+follow and pitfalls to avoid in Iterator implementations.
+
+#### Avoid special logic encoded in Ranges
+
+Commonly, granular Ranges that a client passes to an Iterator from a `Scanner` or `BatchScanner` are unmodified.
+If a `Range` falls within the boundaries of a Tablet, an Iterator will often see that same Range in the
+`seek` method. However, there is no guarantee that the `Range` will remain unaltered from client to server. As such, Iterators
+should *never* make assumptions about the current state/context based on the `Range`.
+
+The common failure condition is referred to as a "re-seek". In the context of a Scan, TabletServers construct the
+"stack" of Iterators and batch up Key-Value pairs to send back to the client. When a sufficient number of Key-Value
+pairs are collected, it is common for the Iterators to be "torn down" until the client asks for the next batch of
+Key-Value pairs. This is done by the TabletServer to add fairness in ensuring one Scan does not monopolize the available
+resources. When the client asks for the next batch, the implementation modifies the original Range so that servers know
+the point to resume the iteration (to avoid returning duplicate Key-Value pairs). Specifically, the new Range is created
+from the original but is shortened by setting the startKey of the original Range to the Key last returned by the Scan,
+non-inclusive.
+
+### `seek`'ing backwards
+
+The ability for an Iterator to "skip over" large blocks of Key-Value pairs is a major tenet behind Iterators.
+By `seek`'ing when it is known that there is a collection of Key-Value pairs which can be ignored can
+greatly increase the speed of a scan as many Key-Value pairs do not have to be deserialized and processed.
+
+While the `seek` method provides the `Range` that should be used to `seek` the underlying source Iterator,
+there is no guarantee that the implementing Iterator uses that `Range` to perform the `seek` on its
+"source" Iterator. As such, it is possible to seek to any `Range` and the interface has no assertions
+to prevent this from happening.
+
+Since Iterators are allowed to `seek` to arbitrary Keys, it also allows Iterators to create infinite loops
+inside Scans that will repeatedly read the same data without end. If an arbitrary Range is constructed, it should
+construct a completely new Range as it allows for bugs to be introduced which will break Accumulo.
+
+Thus, `seek`'s should always be thought of as making "forward progress" in the view of the total iteration. The
+`startKey` of a `Range` should always be greater than the current Key seen by the Iterator while the `endKey` of the
+`Range` should always retain the original `endKey` (and `endKey` inclusivity) of the last `Range` seen by your
+Iterator's implementation of seek.
+
+### Take caution in constructing new data in an Iterator
+
+Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means
+to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator
+is *not* managed in such a way that guarantees that this is safe nor efficient. Specifically, there
+is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s)
+are reaped) without calling the close() method. `close`'ing and recreating a `BatchWriter` after every
+Key-Value pair is also prohibitively performance limiting to be considered an option.
+
+The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair.
+For example, the `WholeRowIterator` serializes the all of the Key-Values pairs that fall within each
+row. A safe way to generate more data in an Iterator would be to construct an Iterator that is
+"higher" (at a larger priority) than the `WholeRowIterator`, that is, the Iterator receives the Key-Value pairs which are
+a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute
+some function, and add a new Key-Value pair to the original collection, re-serializing the collection
+of Key-Value pairs back into a single Key-Value pair.
+
+Any other situation is likely not guaranteed to ensure that the caller (a Scan or a Compaction) will
+always see all intended data that is generated.
+
+## Final things to remember
+
+Some simple recommendations/points to keep in mind:
+
+### Method call order
+
+On an instance of an Iterator: `init` is always called before `seek`, `seek` is always called before `hasTop`,
+`getTopKey` and `getTopValue` will not be called if `hasTop` returns false.
+
+### Teardown
+
+As mentioned, instance of Iterators may be torn down inside of the server transparently. When a complex
+collection of iterators is performing some advanced functionality, they will not be torn down until a Key-Value
+pair is returned out of the "stack" of Iterators (and added into the batch of Key-Values to be returned
+to the caller). Being torn-down is equivalent to a new instance of the Iterator being creating and `deepCopy`
+being called on the new instance with the old instance provided as the argument to `deepCopy`. References
+to the old instance are removed and the object is lazily garbage collected by the JVM.
+
+## Compaction-time Iterators
+
+When Iterators are configured to run during compactions, at the `minc` or `majc` scope, these Iterators sometimes need
+to make different assertions than those who only operate at scan time. Iterators won't see the delete entries; however,
+Iterators will not necessarily see all of the Key-Value pairs in ever invocation. Because compactions often do not rewrite
+all files (only a subset of them), it is possible that the logic take this into consideration.
+
+For example, a Combiner that runs over data at during compactions, might not see all of the values for a given Key. The
+Combiner must recognize this and not perform any function that would be incorrect due
+to the missing values.
+
+## Testing
+
+The [Iterator test harness][iterator-test-harness] is generalized testing framework for Accumulo Iterators that can
+identify common pitfalls in user-created Iterators.
+
+[iterator-test-harness]: {{ page.docs_baseurl }}/development/development_tools#iterator-test-harness

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/mapreduce.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/mapreduce.md b/_docs-unreleased/development/mapreduce.md
new file mode 100644
index 0000000..98b2682
--- /dev/null
+++ b/_docs-unreleased/development/mapreduce.md
@@ -0,0 +1,181 @@
+---
+title: MapReduce
+category: development
+order: 2
+---
+
+Accumulo tables can be used as the source and destination of MapReduce jobs. To
+use an Accumulo table with a MapReduce job (specifically with the new Hadoop API
+as of version 0.20), configure the job parameters to use the AccumuloInputFormat
+and AccumuloOutputFormat. Accumulo specific parameters can be set via these
+two format classes to do the following:
+
+* Authenticate and provide user credentials for the input
+* Restrict the scan to a range of rows
+* Restrict the input to a subset of available columns
+
+## Mapper and Reducer classes
+
+To read from an Accumulo table create a Mapper with the following class
+parameterization and be sure to configure the AccumuloInputFormat.
+
+```java
+class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> {
+    public void map(Key k, Value v, Context c) {
+        // transform key and value data here
+    }
+}
+```
+
+To write to an Accumulo table, create a Reducer with the following class
+parameterization and be sure to configure the AccumuloOutputFormat. The key
+emitted from the Reducer identifies the table to which the mutation is sent. This
+allows a single Reducer to write to more than one table if desired. A default table
+can be configured using the AccumuloOutputFormat, in which case the output table
+name does not have to be passed to the Context object within the Reducer.
+
+```java
+class MyReducer extends Reducer<WritableComparable, Writable, Text, Mutation> {
+    public void reduce(WritableComparable key, Iterable<Text> values, Context c) {
+        Mutation m;
+        // create the mutation based on input key and value
+        c.write(new Text("output-table"), m);
+    }
+}
+```
+
+The Text object passed as the output should contain the name of the table to which
+this mutation should be applied. The Text can be null in which case the mutation
+will be applied to the default table name specified in the AccumuloOutputFormat
+options.
+
+## AccumuloInputFormat options
+
+```java
+Job job = new Job(getConf());
+AccumuloInputFormat.setInputInfo(job,
+        "user",
+        "passwd".getBytes(),
+        "table",
+        new Authorizations());
+
+AccumuloInputFormat.setZooKeeperInstance(job, "myinstance",
+        "zooserver-one,zooserver-two");
+```
+
+**Optional Settings:**
+
+To restrict Accumulo to a set of row ranges:
+
+```java
+ArrayList<Range> ranges = new ArrayList<Range>();
+// populate array list of row ranges ...
+AccumuloInputFormat.setRanges(job, ranges);
+```
+
+To restrict Accumulo to a list of columns:
+
+```java
+ArrayList<Pair<Text,Text>> columns = new ArrayList<Pair<Text,Text>>();
+// populate list of columns
+AccumuloInputFormat.fetchColumns(job, columns);
+```
+
+To use a regular expression to match row IDs:
+
+```java
+IteratorSetting is = new IteratorSetting(30, RexExFilter.class);
+RegExFilter.setRegexs(is, ".*suffix", null, null, null, true);
+AccumuloInputFormat.addIterator(job, is);
+```
+
+## AccumuloMultiTableInputFormat options
+
+The AccumuloMultiTableInputFormat allows the scanning over multiple tables
+in a single MapReduce job. Separate ranges, columns, and iterators can be
+used for each table.
+
+```java
+InputTableConfig tableOneConfig = new InputTableConfig();
+InputTableConfig tableTwoConfig = new InputTableConfig();
+```
+
+To set the configuration objects on the job:
+
+```java
+Map<String, InputTableConfig> configs = new HashMap<String,InputTableConfig>();
+configs.put("table1", tableOneConfig);
+configs.put("table2", tableTwoConfig);
+AccumuloMultiTableInputFormat.setInputTableConfigs(job, configs);
+```
+
+**Optional settings:**
+
+To restrict to a set of ranges:
+
+```java
+ArrayList<Range> tableOneRanges = new ArrayList<Range>();
+ArrayList<Range> tableTwoRanges = new ArrayList<Range>();
+// populate array lists of row ranges for tables...
+tableOneConfig.setRanges(tableOneRanges);
+tableTwoConfig.setRanges(tableTwoRanges);
+```
+
+To restrict Accumulo to a list of columns:
+
+```java
+ArrayList<Pair<Text,Text>> tableOneColumns = new ArrayList<Pair<Text,Text>>();
+ArrayList<Pair<Text,Text>> tableTwoColumns = new ArrayList<Pair<Text,Text>>();
+// populate lists of columns for each of the tables ...
+tableOneConfig.fetchColumns(tableOneColumns);
+tableTwoConfig.fetchColumns(tableTwoColumns);
+```
+
+To set scan iterators:
+
+```java
+List<IteratorSetting> tableOneIterators = new ArrayList<IteratorSetting>();
+List<IteratorSetting> tableTwoIterators = new ArrayList<IteratorSetting>();
+// populate the lists of iterator settings for each of the tables ...
+tableOneConfig.setIterators(tableOneIterators);
+tableTwoConfig.setIterators(tableTwoIterators);
+```
+
+The name of the table can be retrieved from the input split:
+
+```java
+class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> {
+    public void map(Key k, Value v, Context c) {
+        RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
+        String tableName = split.getTableName();
+        // do something with table name
+    }
+}
+```
+
+## AccumuloOutputFormat options
+
+```java
+boolean createTables = true;
+String defaultTable = "mytable";
+
+AccumuloOutputFormat.setOutputInfo(job,
+        "user",
+        "passwd".getBytes(),
+        createTables,
+        defaultTable);
+
+AccumuloOutputFormat.setZooKeeperInstance(job, "myinstance",
+        "zooserver-one,zooserver-two");
+```
+
+**Optional Settings:**
+
+```java
+AccumuloOutputFormat.setMaxLatency(job, 300000); // milliseconds
+AccumuloOutputFormat.setMaxMutationBufferSize(job, 50000000); // bytes
+```
+
+The [MapReduce example][mapred-example] contains a complete example of using MapReduce with Accumulo.
+
+[mapred-example]: https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/proxy.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/proxy.md b/_docs-unreleased/development/proxy.md
new file mode 100644
index 0000000..6e9f7eb
--- /dev/null
+++ b/_docs-unreleased/development/proxy.md
@@ -0,0 +1,121 @@
+---
+title: Proxy
+category: development
+order: 3
+---
+
+## Proxy
+
+The proxy API allows the interaction with Accumulo with languages other than Java.
+A proxy server is provided in the codebase and a client can further be generated.
+The proxy API can also be used instead of the traditional ZooKeeperInstance class to
+provide a single TCP port in which clients can be securely routed through a firewall,
+without requiring access to all tablet servers in the cluster.
+
+### Prerequisites
+
+The proxy server can live on any node in which the basic client API would work. That
+means it must be able to communicate with the Master, ZooKeepers, NameNode, and the
+DataNodes. A proxy client only needs the ability to communicate with the proxy server.
+
+### Configuration
+
+The configuration options for the proxy server live inside of a properties file. At
+the very least, you need to supply the following properties:
+
+    protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
+    tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken
+    port=42424
+    instance=test
+    zookeepers=localhost:2181
+
+You can find a sample configuration file in your distribution at `proxy/proxy.properties`.
+
+This sample configuration file further demonstrates an ability to back the proxy server
+by MockAccumulo or the MiniAccumuloCluster.
+
+### Running the Proxy Server
+
+After the properties file holding the configuration is created, the proxy server
+can be started using the following command in the Accumulo distribution (assuming
+your properties file is named `config.properties`):
+
+    accumulo proxy -p config.properties
+
+### Creating a Proxy Client
+
+Aside from installing the Thrift compiler, you will also need the language-specific library
+for Thrift installed to generate client code in that language. Typically, your operating
+system's package manager will be able to automatically install these for you in an expected
+location such as `/usr/lib/python/site-packages/thrift`.
+
+You can find the thrift file for generating the client at `proxy/proxy.thrift`.
+
+After a client is generated, the port specified in the configuration properties above will be
+used to connect to the server.
+
+### Using a Proxy Client
+
+The following examples have been written in Java and the method signatures may be
+slightly different depending on the language specified when generating client with
+the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift's
+documentation for examples of connecting to a Thrift service), the methods on the
+proxy client will be available. The first thing to do is log in:
+
+```java
+Map password = new HashMap<String,String>();
+password.put("password", "secret");
+ByteBuffer token = client.login("root", password);
+```
+
+Once logged in, the token returned will be used for most subsequent calls to the client.
+Let's create a table, add some data, scan the table, and delete it.
+
+First, create a table.
+
+```java
+client.createTable(token, "myTable", true, TimeType.MILLIS);
+```
+
+Next, add some data:
+
+```java
+// first, create a writer on the server
+String writer = client.createWriter(token, "myTable", new WriterOptions());
+
+//rowid
+ByteBuffer rowid = ByteBuffer.wrap("UUID".getBytes());
+
+//mutation like class
+ColumnUpdate cu = new ColumnUpdate();
+cu.setColFamily("MyFamily".getBytes());
+cu.setColQualifier("MyQualifier".getBytes());
+cu.setColVisibility("VisLabel".getBytes());
+cu.setValue("Some Value.".getBytes());
+
+List<ColumnUpdate> updates = new ArrayList<ColumnUpdate>();
+updates.add(cu);
+
+// build column updates
+Map<ByteBuffer, List<ColumnUpdate>> cellsToUpdate = new HashMap<ByteBuffer, List<ColumnUpdate>>();
+cellsToUpdate.put(rowid, updates);
+
+// send updates to the server
+client.updateAndFlush(writer, "myTable", cellsToUpdate);
+
+client.closeWriter(writer);
+```
+
+Scan for the data and batch the return of the results on the server:
+
+```java
+String scanner = client.createScanner(token, "myTable", new ScanOptions());
+ScanResult results = client.nextK(scanner, 100);
+
+for(KeyValue keyValue : results.getResultsIterator()) {
+  // do something with results
+}
+
+client.closeScanner(scanner);
+```
+

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/sampling.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/sampling.md b/_docs-unreleased/development/sampling.md
index 4a76c39..b1c54ef 100644
--- a/_docs-unreleased/development/sampling.md
+++ b/_docs-unreleased/development/sampling.md
@@ -1,7 +1,7 @@
 ---
 title: Sampling
 category: development
-order: 4
+order: 5
 ---
 
 ## Overview

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/security.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/security.md b/_docs-unreleased/development/security.md
index ea1f997..0671d50 100644
--- a/_docs-unreleased/development/security.md
+++ b/_docs-unreleased/development/security.md
@@ -1,7 +1,7 @@
 ---
 title: Security
 category: development
-order: 6
+order: 7
 ---
 
 Accumulo extends the BigTable data model to implement a security mechanism

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/development/summaries.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/development/summaries.md b/_docs-unreleased/development/summaries.md
index a86e30d..1e8a8b4 100644
--- a/_docs-unreleased/development/summaries.md
+++ b/_docs-unreleased/development/summaries.md
@@ -1,7 +1,7 @@
 ---
 title: Summary Statistics
 category: development
-order: 5
+order: 6
 ---
 
 ## Overview

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/817a0ef7/_docs-unreleased/getting-started/clients.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/getting-started/clients.md b/_docs-unreleased/getting-started/clients.md
index 88d4a13..5dc52d3 100644
--- a/_docs-unreleased/getting-started/clients.md
+++ b/_docs-unreleased/getting-started/clients.md
@@ -265,120 +265,13 @@ You may consider using the [WholeRowIterator] with the BatchScanner to achieve
 isolation. The drawback of this approach is that entire rows are read into
 memory on the server side. If a row is too big, it may crash a tablet server.
 
-## Proxy
+## Additional Documentation
 
-The proxy API allows the interaction with Accumulo with languages other than Java.
-A proxy server is provided in the codebase and a client can further be generated.
-The proxy API can also be used instead of the traditional ZooKeeperInstance class to
-provide a single TCP port in which clients can be securely routed through a firewall,
-without requiring access to all tablet servers in the cluster.
+This page covers Accumulo client basics.  Below are links to additional documentation that may be useful when creating Accumulo clients:
 
-### Prerequisites
-
-The proxy server can live on any node in which the basic client API would work. That
-means it must be able to communicate with the Master, ZooKeepers, NameNode, and the
-DataNodes. A proxy client only needs the ability to communicate with the proxy server.
-
-### Configuration
-
-The configuration options for the proxy server live inside of a properties file. At
-the very least, you need to supply the following properties:
-
-    protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
-    tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken
-    port=42424
-    instance=test
-    zookeepers=localhost:2181
-
-You can find a sample configuration file in your distribution at `proxy/proxy.properties`.
-
-This sample configuration file further demonstrates an ability to back the proxy server
-by MockAccumulo or the MiniAccumuloCluster.
-
-### Running the Proxy Server
-
-After the properties file holding the configuration is created, the proxy server
-can be started using the following command in the Accumulo distribution (assuming
-your properties file is named `config.properties`):
-
-    accumulo proxy -p config.properties
-
-### Creating a Proxy Client
-
-Aside from installing the Thrift compiler, you will also need the language-specific library
-for Thrift installed to generate client code in that language. Typically, your operating
-system's package manager will be able to automatically install these for you in an expected
-location such as `/usr/lib/python/site-packages/thrift`.
-
-You can find the thrift file for generating the client at `proxy/proxy.thrift`.
-
-After a client is generated, the port specified in the configuration properties above will be
-used to connect to the server.
-
-### Using a Proxy Client
-
-The following examples have been written in Java and the method signatures may be
-slightly different depending on the language specified when generating client with
-the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift's
-documentation for examples of connecting to a Thrift service), the methods on the
-proxy client will be available. The first thing to do is log in:
-
-```java
-Map password = new HashMap<String,String>();
-password.put("password", "secret");
-ByteBuffer token = client.login("root", password);
-```
-
-Once logged in, the token returned will be used for most subsequent calls to the client.
-Let's create a table, add some data, scan the table, and delete it.
-
-First, create a table.
-
-```java
-client.createTable(token, "myTable", true, TimeType.MILLIS);
-```
-
-Next, add some data:
-
-```java
-// first, create a writer on the server
-String writer = client.createWriter(token, "myTable", new WriterOptions());
-
-//rowid
-ByteBuffer rowid = ByteBuffer.wrap("UUID".getBytes());
-
-//mutation like class
-ColumnUpdate cu = new ColumnUpdate();
-cu.setColFamily("MyFamily".getBytes());
-cu.setColQualifier("MyQualifier".getBytes());
-cu.setColVisibility("VisLabel".getBytes());
-cu.setValue("Some Value.".getBytes());
-
-List<ColumnUpdate> updates = new ArrayList<ColumnUpdate>();
-updates.add(cu);
-
-// build column updates
-Map<ByteBuffer, List<ColumnUpdate>> cellsToUpdate = new HashMap<ByteBuffer, List<ColumnUpdate>>();
-cellsToUpdate.put(rowid, updates);
-
-// send updates to the server
-client.updateAndFlush(writer, "myTable", cellsToUpdate);
-
-client.closeWriter(writer);
-```
-
-Scan for the data and batch the return of the results on the server:
-
-```java
-String scanner = client.createScanner(token, "myTable", new ScanOptions());
-ScanResult results = client.nextK(scanner, 100);
-
-for(KeyValue keyValue : results.getResultsIterator()) {
-  // do something with results
-}
-
-client.closeScanner(scanner);
-```
+* [Iterators] - Server-side programming mechanism that can modify key/value pairs at various points in data management process
+* [Proxy] - Documentation for interacting with Accumulo using non-Java languages through a proxy server
+* [MapReduce] - Documentation for reading and writing to Accumulo using MapReduce.
 
 [PasswordToken]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/security/tokens/PasswordToken.html
 [AuthenticationToken]: {{ page.javadoc_core }}/org/apache/accumulo/core/client/security/tokens/AuthenticationToken.html
@@ -392,3 +285,6 @@ client.closeScanner(scanner);
 [BatchScanner]: {{ page.javadoc_core}}/org/apache/accumulo/core/client/BatchScanner.html
 [Range]: {{ page.javadoc_core }}/org/apache/accumulo/core/data/Range.html
 [WholeRowIterator]: {{ page.javadoc_core }}/org/apache/accumulo/core/iterators/user/WholeRowIterator.html
+[Iterators]: {{ page.docs_baseurl }}/development/iterators
+[Proxy]: {{ page.docs_baseurl }}/development/proxy
+[MapReduce]: {{ page.docs_baseurl }}/development/mapreduce


Mime
View raw message