fluo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mwa...@apache.org
Subject [4/8] incubator-fluo-website git commit: Added recipes documentation
Date Fri, 28 Oct 2016 16:57:41 GMT
Added recipes documentation

* Improved conversion scripts


Project: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/commit/cbb726c6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/tree/cbb726c6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/diff/cbb726c6

Branch: refs/heads/gh-pages
Commit: cbb726c6f62d593943a5b83dbff404b762c8294a
Parents: 4e03352
Author: Mike Walch <mwalch@apache.org>
Authored: Mon Oct 24 11:51:01 2016 -0400
Committer: Mike Walch <mwalch@apache.org>
Committed: Fri Oct 28 12:48:55 2016 -0400

----------------------------------------------------------------------
 README.md                                       |   2 +-
 _config.yml                                     |   4 +-
 _scripts/convert-fluo-docs.py                   |  15 +-
 _scripts/convert-recipes-docs.py                |  15 +-
 .../1.0.0-incubating/accumulo-export-queue.md   |  98 +++++++
 docs/fluo-recipes/1.0.0-incubating/cfm.md       | 244 ++++++++++++++++
 .../1.0.0-incubating/export-queue.md            | 288 +++++++++++++++++++
 docs/fluo-recipes/1.0.0-incubating/index.md     |  94 ++++++
 .../1.0.0-incubating/recording-tx.md            |  72 +++++
 .../fluo-recipes/1.0.0-incubating/row-hasher.md | 122 ++++++++
 .../1.0.0-incubating/serialization.md           |  76 +++++
 .../1.0.0-incubating/table-optimization.md      |  64 +++++
 docs/fluo-recipes/1.0.0-incubating/testing.md   |  14 +
 docs/fluo-recipes/1.0.0-incubating/transient.md |  83 ++++++
 14 files changed, 1181 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index ecbb3b8..b5fbaf1 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ Steps to update website for new Fluo Recipes release:
 
     ```bash
     cd fluo-website
-    mkdir -p docs/1.0.0-beta-1
+    mkdir -p docs/fluo-recipes/1.0.0-beta-1
     ./_scripts/convert-recipes-docs.py /path/to/fluo-recipes/docs/ /path/to/fluo-website/docs/fluo-recipes/1.0.0-beta-1/
     ```
 

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/_config.yml
----------------------------------------------------------------------
diff --git a/_config.yml b/_config.yml
index 6595df7..155f117 100644
--- a/_config.yml
+++ b/_config.yml
@@ -44,9 +44,7 @@ color: default
 
 # Fluo specific settings
 latest_fluo_release: "1.0.0-incubating"
-latest_fluo_release_date: "October 4, 2016"
-latest_recipes_release: "1.0.0-beta-2"
-latest_recipes_release_date: "March 29, 2016"
+latest_recipes_release: "1.0.0-incubating"
 
 # Sets links to external API
 api_base: "https://javadoc.io/doc/org.apache.fluo"

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/_scripts/convert-fluo-docs.py
----------------------------------------------------------------------
diff --git a/_scripts/convert-fluo-docs.py b/_scripts/convert-fluo-docs.py
index b740cdb..4ceed13 100755
--- a/_scripts/convert-fluo-docs.py
+++ b/_scripts/convert-fluo-docs.py
@@ -34,11 +34,18 @@ def convert_file(inPath, outPath):
   print "Creating ", outPath
 
   with open(inPath) as fin:
+
     # skip license
-    for x in range(0, 16):
-      fin.readline()
+    line = ''
+    while not line.startswith('-->'):
+      line = fin.readline().strip()
+
+    # read title
+    title = ''
+    while len(title) == 0:
+      title = fin.readline().strip()
+    title = title.lstrip(' #').strip()
 
-    title = fin.readline().strip()
     fin.readline()
 
     if inPath.endswith("README.md"):
@@ -66,7 +73,7 @@ def convert_file(inPath, outPath):
           elif line.find("../modules") != -1:
             if line.strip().endswith(".java"):
               start = line.find("../modules/")
-              end = line.find("io/fluo")
+              end = line.find("apache/fluo")
               fout.write(line.replace(line[start:end], javadocs_prefix).replace(".java", ".html"))
             else:
               fout.write(line.replace("../modules/", github_prefix))

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/_scripts/convert-recipes-docs.py
----------------------------------------------------------------------
diff --git a/_scripts/convert-recipes-docs.py b/_scripts/convert-recipes-docs.py
index 5beae9b..9ace0ff 100755
--- a/_scripts/convert-recipes-docs.py
+++ b/_scripts/convert-recipes-docs.py
@@ -31,7 +31,18 @@ def convert_file(inPath, outPath):
   print "Creating ", outPath
 
   with open(inPath) as fin:
-    title = fin.readline().lstrip(' #').strip()
+
+    # skip license
+    line = ''
+    while not line.startswith('-->'):
+      line = fin.readline().strip()
+
+    # read title
+    title = ''
+    while len(title) == 0:
+      title = fin.readline().strip()
+    title = title.lstrip(' #').strip()
+
     fin.readline()
 
     if inPath.endswith("README.md"):
@@ -59,7 +70,7 @@ def convert_file(inPath, outPath):
           elif line.find("../modules") != -1:
             if line.strip().endswith(".java"):
               start = line.find("../modules/")
-              end = line.find("io/fluo")
+              end = line.find("apache/fluo")
               fout.write(line.replace(line[start:end], javadocs_prefix).replace(".java", ".html"))
             else:
               fout.write(line.replace("../modules/", github_prefix))

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/accumulo-export-queue.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/accumulo-export-queue.md b/docs/fluo-recipes/1.0.0-incubating/accumulo-export-queue.md
new file mode 100644
index 0000000..9e9190f
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/accumulo-export-queue.md
@@ -0,0 +1,98 @@
+---
+layout: recipes-doc
+title: Accumulo Export Queue Specialization
+version: 1.0.0-incubating
+---
+## Background
+
+The [Export Queue Recipe][1] provides a generic foundation for building export mechanism to any
+external data store. The [AccumuloExporter] provides an implementation of this recipe for
+Accumulo. The [AccumuloExporter] is located the `fluo-recipes-accumulo` module and provides the
+following functionality:
+
+ * Safely batches writes to Accumulo made by multiple transactions exporting data.
+ * Stores Accumulo connection information in Fluo configuration, making it accessible by Export
+   Observers running on other nodes.
+ * Provides utility code that make it easier and shorter to code common Accumulo export patterns.
+
+## Example Use
+
+Exporting to Accumulo is easy. Follow the steps below:
+
+1. Implement a class that extends [AccumuloExporter].  This class will process exported objects that
+   are placed on your export queue. For example, the `SimpleExporter` class below processes String
+   key/value exports and generates mutations for Accumulo.
+
+    ```java
+    public class SimpleExporter extends AccumuloExporter<String, String> {
+
+      @Override
+      protected void translate(SequencedExport<String, String> export, Consumer<Mutation> consumer) {
+        Mutation m = new Mutation(export.getKey());
+        m.put("cf", "cq", export.getSequence(), export.getValue());
+        consumer.accept(m);
+      }
+    }
+    ```
+
+2. With a `SimpleExporter` created, configure an `ExportQueue` to use `SimpleExporter` and
+   give it information on how to connect to Accumulo. 
+
+    ```java
+
+    FluoConfiguration fluoConfig = ...;
+
+    // Set accumulo configuration
+    String instance =       // Name of accumulo instance exporting to
+    String zookeepers =     // Zookeepers used by Accumulo instance exporting to
+    String user =           // Accumulo username, user that can write to exportTable
+    String password =       // Accumulo user password
+    String exportTable =    // Name of table to export to
+
+
+    // Create config for export table.
+    AccumuloExporter.Configuration exportTableCfg =
+        new AccumuloExporter.Configuration(instance, zookeepers, user, password, exportTable);
+
+    // Create config for export queue.
+    ExportQueue.Options eqOpts = new ExportQueue.Options(EXPORT_QUEUE_ID, SimpleExporter.class,
+        String.class, String.class, numMapBuckets).setExporterConfiguration(exportTableCfg);
+
+    // Configure export queue.  This will modify fluoConfig.
+    ExportQueue.configure(fluoConfig, qeOpts);
+
+    // Initialize Fluo using fluoConfig
+    ```
+
+3.  Export queues can be retrieved in Fluo observers and objects can be added to them:
+
+    ```java
+    public class MyObserver extends AbstractObserver {
+
+      ExportQueue<String, String> exportQ;
+
+      @Override
+      public void init(Context context) throws Exception {
+        exportQ = ExportQueue.getInstance(EXPORT_QUEUE_ID, context.getAppConfiguration());
+      }
+
+      @Override
+      public void process(TransactionBase tx, Bytes row, Column col) {
+
+        // Read some data and do some work
+
+        // Add results to export queue
+        String key =    // key that identifies export
+        String value =  // object to export
+        export.add(tx, key, value);
+      }
+    ```
+
+## Other use cases
+
+[AccumuloReplicator] is a specialized [AccumuloExporter] that replicates a Fluo table to Accumulo.
+
+[1]: /docs/fluo-recipes/1.0.0-incubating/export-queue/
+[AccumuloExporter]: {{ site.api_static }}/fluo-recipes-accumulo/1.0.0-incubating/apache/fluo/recipes/accumulo/export/AccumuloExporter.html
+[AccumuloReplicator]: {{ site.api_static }}/fluo-recipes-accumulo/1.0.0-incubating/apache/fluo/recipes/accumulo/export/DifferenceExport.html
+

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/cfm.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/cfm.md b/docs/fluo-recipes/1.0.0-incubating/cfm.md
new file mode 100644
index 0000000..e23ae38
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/cfm.md
@@ -0,0 +1,244 @@
+---
+layout: recipes-doc
+title: Collision Free Map Recipe
+version: 1.0.0-incubating
+---
+## Background
+
+When many transactions are trying to modify the same keys, collisions will occur.
+These collisions will cause the transactions to fail and throughput to nose
+dive.  For example consider the [phrasecount] example.  In this example many
+transactions are processing documents as input.  Each transaction counts the
+phrases in its document and then tries to update global phrase counts.  With
+each transaction attempting to update many phrase counts, the probability of
+two transactions colliding is very high.
+
+## Solution
+
+This recipe provides a reusable solution for the problem of many transactions
+updating many keys while avoiding collisions.  As an added bonus, this recipe
+also organizes updates into batches for efficiency in order to improve
+throughput.
+
+The central idea behind this recipe is that updates to a key are queued up to
+be processed by another transaction triggered by weak notifications.  In the
+phrase count example transactions processing documents would queue updates,
+but would not actually update the counts.  Below is an example of how
+transactions would compute phrasecounts using this recipe.
+
+ * TX1 queues `+1` update  for phrase `we want lambdas now`
+ * TX2 queues `+1` update  for phrase `we want lambdas now`
+ * TX3 reads the updates and current value for the phrase `we want lambdas now`.  There is no current value and the updates sum to 2, so a new value of 2 is written.
+ * TX4 queues `+2` update  for phrase `we want lambdas now`
+ * TX5 queues `-1` update  for phrase `we want lambdas now`
+ * TX6 reads the updates and current value for the phrase `we want lambdas now`.  The current value is 2 and the updates sum to 1, so a new value of 3 is written.
+
+Transactions processing updates have the ability to make additional updates.
+For example in addition to updating the current value for a phrase, the new
+value could also be placed on an export queue to update an external database.
+
+### Buckets
+
+A simple implementation of this recipe would be to have an update queue for
+each key.  However the implementation does something slightly more complex.
+Each update queue is in a bucket and transactions that process updates, process
+all of the updates in a bucket.  This allows more efficient processing of
+updates for the following reasons :
+
+ * When updates are queued, notifications are made per bucket(instead of per a key).
+ * The transaction doing the update can scan the entire bucket reading updates, this avoids a seek for each key being updated.  
+ * Also the transaction can request a batch lookup to get the current value of all the keys being updated.
+ * Any additional actions taken on update (like adding something to an export queue) can also be batched.
+ * Data is organized to make reading exiting values for keys in a bucket more efficient.
+
+Which bucket a key goes to is decided using hash and modulus so that multiple
+updates for the same key always go to the same bucket.
+
+The initial number of tablets to create when applying table optimizations can be
+controlled by setting the buckets per tablet option when configuring a Collision
+Free Map.  For example if you have 20 tablet servers and 1000 buckets and want
+2 tablets per tserver initially then set buckets per tablet to 1000/(2*20)=25.
+
+## Example Use
+
+The following code snippets show how to setup and use this recipe for
+wordcount.  The first step in using this recipe is to configure it before
+initializing Fluo.  When initializing an ID will need to be provided.  This ID
+is used in two ways.  First, the ID is used as a row prefix in the table.
+Therefore nothing else should use that row range in the table.  Second, the ID
+is used in generating configuration keys associated with the instance of the
+Collision Free Map.
+
+The following snippet shows how to setup a collision free map.  
+
+```java
+  FluoConfiguration fluoConfig = ...;
+
+  int numBuckets = 119;
+
+  WordCountMap.configure(fluoConfig, 119);
+
+  //initialize Fluo using fluoConfig
+
+```
+
+Assume the following observer is triggered when a documents contents are
+updated.  It examines new and old document content and determines changes in
+word counts.  These changes are pushed to a collision free map.
+
+```java
+public class DocumentObserver extends TypedObserver {
+
+  CollisionFreeMap<String, Long> wcm;
+
+  @Override
+  public void init(Context context) throws Exception {
+    wcm = CollisionFreeMap.getInstance(WordCountMap.ID, context.getAppConfiguration());
+  }
+
+  @Override
+  public ObservedColumn getObservedColumn() {
+    return new ObservedColumn(new Column("content", "new"), NotificationType.STRONG);
+  }
+
+  @Override
+  public void process(TypedTransactionBase tx, Bytes row, Column col) {
+    String newContent = tx.get().row(row).col(col).toString();
+    String currentContent = tx.get().row(row).fam("content").qual("current").toString("");
+
+    Map<String, Long> newWordCounts = getWordCounts(newContent);
+    Map<String, Long> currentWordCounts = getWordCounts(currentContent);
+
+    //determine changes in word counts between old and new document content
+    Map<String, Long> changes = calculateChanges(newWordCounts, currentWordCounts);    
+
+    //queue updates to word counts for processing by other transactions
+    wcm.update(tx, changes);
+
+    //update the current content and delete the new content
+    tx.mutate().row(row).fam("content").qual("current").set(newContent);
+    tx.mutate().row(row).col(col).delete();
+  }
+
+  private static Map<String, Long> getWordCounts(String doc) {
+   //TODO extract words from doc
+  }
+
+  private static Map<String, Long> calculateChanges(Map<String, Long> newCounts,
+      Map<String, Long> currCounts) {
+    Map<String, Long> changes = new HashMap<>();
+
+    // guava Maps class
+    MapDifference<String, Long> diffs = Maps.difference(currCounts, newCounts);
+
+    // compute the diffs for words that changed
+    changes.putAll(Maps.transformValues(diffs.entriesDiffering(),
+        vDiff -> vDiff.rightValue() - vDiff.leftValue()));
+
+    // add all new words
+    changes.putAll(diffs.entriesOnlyOnRight());
+
+    // subtract all words no longer present
+    changes.putAll(Maps.transformValues(diffs.entriesOnlyOnLeft(), l -> l * -1));
+
+    return changes;
+  }
+
+}
+```
+
+Each collision free map has two extension points, a combiner and an update
+observer.  These two extension points are defined below as `WordCountCombiner`
+and  `WordCountObserver`.  The collision free map configures a Fluo observer that
+will process queued updates.  When processing these queued updates the two
+extension points are called.  In this example `WordCountCombiner` is called to
+combine updates that were queued by `DocumentObserver`. The collision free map
+will process a batch of keys, calling the combiner for each key.  When finished
+processing a batch, it will call the update observer `WordCountObserver`.
+
+An update observer can do additional processing when a batch of key values are
+updated.  In `WordCountObserver`, updates are queued for export to an external
+database.  The export is given the new and old value allowing it to delete the
+old value if needed.
+
+```java
+/**
+ * This class exists to provide a single place to put all code related to the
+ * word count map.
+ */
+public class WordCountMap {
+
+  public static final String ID = "wc";
+
+  /**
+   * A helper method for configuring the word count map.
+   *
+   * @param numTablets the desired number of tablets to create when applying table optimizations
+   */
+  public static void configure(FluoConfiguration fluoConfig, int numBuckets, int numTablets) {
+    Options cfmOpts =
+      new Options(ID, WordCountCombiner.class,  WordCountObserver.class, String.class, Long.class, numBuckets)
+        .setBucketsPerTablet(numBuckets/numTablets);
+    CollisionFreeMap.configure(fluoConfig, cfmOpts);
+  }
+
+  public static class WordCountCombiner implements Combiner<String, Long> {
+    @Override
+    public Optional<Long> combine(String key, Iterator<Long> updates) {
+      long sum = 0L;
+
+      while (updates.hasNext()) {
+        sum += updates.next();
+      }
+
+      if (sum == 0) {
+        //returning absent will cause the collision free map to delte the current key
+        return Optional.absent();
+      } else {
+        return Optional.of(sum);
+      }
+    }
+  }
+
+  public static class WordCountObserver extends UpdateObserver<String, Long> {
+
+    private ExportQueue<String, MyDatabaseExport> exportQ;
+
+    @Override
+    public void init(String mapId, Context observerContext) throws Exception {
+      exportQ = ExportQueue.getInstance(MyExportQ.ID, observerContext.getAppConfiguration());
+    }
+
+    @Override
+    public void updatingValues(TransactionBase tx, Iterator<Update<String, Long>> updates) {
+      while (updates.hasNext()) {
+        Update<String, Long> update = updates.next();
+
+        String word = update.getKey();
+        Optional<Long> oldVal = update.getOldValue();
+        Optional<Long> newVal = update.getNewValue();
+
+        //queue an export to let an external database know the word count has changed
+        exportQ.add(word, new MyDatabaseExport(oldVal, newVal));
+      }
+    }
+  }
+}
+```
+
+## Guarantees
+
+This recipe makes two important guarantees about updates for a key when it
+calls `updatingValues()` on an `UpdateObserver`.
+
+ * The new value reported for an update will be derived from combining all
+   updates that were committed before the transaction thats processing updates
+   started.  The implementation may have to make multiple passes over queued
+   updates to achieve this.  In the situation where TX1 queues a `+1` and later
+   TX2 queues a `-1` for the same key, there is no need to worry about only seeing
+   the `-1` processed.  A transaction that started processing updates after TX2
+   committed would process both.
+ * The old value will always be what was reported as the new value in the
+   previous transaction that called `updatingValues()`.  
+ 
+[phrasecount]: https://github.com/fluo-io/phrasecount

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/export-queue.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/export-queue.md b/docs/fluo-recipes/1.0.0-incubating/export-queue.md
new file mode 100644
index 0000000..4708f79
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/export-queue.md
@@ -0,0 +1,288 @@
+---
+layout: recipes-doc
+title: Export Queue Recipe
+version: 1.0.0-incubating
+---
+## Background
+
+Fluo is not suited for servicing low latency queries for two reasons.  The
+first reason is that the implementation of transactions are designed for
+throughput.  To get throughput, transactions recover lazily from failures and
+may wait on another transaction that is writing.  Both of these design decisions
+can lead to delays for an individual transaction, but do not negatively impact
+throughput.   The second reason is that Fluo observers executing transactions
+will likely cause a large number of random accesses.  This could lead to high
+response time variability for an individual random access.  This variability
+would not impede throughput but would impede the goal of latency.
+
+One way to make data transformed by Fluo available for low latency queries is
+to export that data to another system.  For example Fluo could be running
+cluster A, continually transforming a large data set, and exporting data to
+Accumulo tables on cluster B.  The tables on cluster B would service user
+queries.  Fluo Recipes has built in support for [exporting to Accumulo][3],
+however recipe could be used to export to systems other than Accumulo, like
+Redis, Elasticsearch, MySQL, etc.
+
+Exporting data from Fluo is easy to get wrong which is why this recipe exists.
+To understand what can go wrong consider the following example observer
+transaction.
+
+```java
+public class MyObserver extends AbstractObserver {
+
+    private static final TYPEL = new TypeLayer(new StringEncoder());
+
+    //reperesents a Query system extrnal to Fluo that is updated by Fluo
+    QuerySystem querySystem;
+
+    @Override
+    public void process(TransactionBase tx, Bytes row, Column col) {
+
+        TypedTransactionBase ttx = TYPEL.wrap(tx);
+        int oldCount = ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
+        int numUpdates = ttx.get().row(row).fam("meta").qual("numUpdates").toInteger(0);
+        int newCount = oldCount + numUpdates;
+        ttx.mutate().row(row).fam("meta").qual("counter1").set(newCount);
+        ttx.mutate().row(row).fam("meta").qual("numUpdates").set(0);
+
+        //Build an inverted index in the query system, based on count from the
+        //meta:counter1 column in fluo.  Do this by creating rows for the
+        //external query system based on the count.        
+        String oldCountRow = String.format("%06d", oldCount);
+        String newCountRow = String.format("%06d", newCount);
+        
+        //add a new entry to the inverted index
+        querySystem.insertRow(newCountRow, row);
+        //remove the old entry from the inverted index
+        querySystem.deleteRow(oldCountRow, row);
+    }
+}
+```
+
+The above example would keep the external index up to date beautifully as long
+as the following conditions are met.
+
+  * Threads executing transactions always complete successfully.
+  * Only a single thread ever responds to a notification.
+
+However these conditions are not guaranteed by Fluo.  Multiple threads may
+attempt to process a notification concurrently (only one may succeed).  Also at
+any point in time a transaction may fail (for example the computer executing it
+may reboot).   Both of these problems will occur and will lead to corruption of
+the external index in the example.  The inverted index and Fluo  will become
+inconsistent.  The inverted index will end up with multiple entries (that are
+never cleaned up) for single entity even though the intent is to only have one.
+
+The root of the problem in the example above is that its exporting uncommitted
+data.  There is no guarantee that setting the column `<row>:meta:counter1` to
+`newCount` will succeed until the transaction is successfully committed.
+However, `newCountRow` is derived from `newCount` and written to the external query
+system before the transaction is committed (Note : for observers, the
+transaction is committed by the framework after `process(...)` is called).  So
+if the transaction fails, the next time it runs it could compute a completely
+different value for `newCountRow` (and it would not delete what was written by the
+failed transaction).  
+
+## Solution 
+
+The simple solution to the problem of exporting uncommitted data is to only
+export committed data.  There are multiple ways to accomplish this.  This
+recipe offers a reusable implementation of one method.  This recipe has the
+following elements:
+
+ * An export queue that transactions can add key/values to.  Only if the transaction commits successfully will the key/value end up in the queue.  A Fluo application can have multiple export queues, each one must have a unique id.
+ * When a key/value is added to the export queue, its given a sequence number.  This sequence number is based on the transactions start timestamp.
+ * Each export queue is configured with an observer that processes key/values that were successfully committed to the queue.
+ * When key/values in an export queue are processed, they are deleted so the export queue does not keep any long term data.
+ * Key/values in an export queue are placed in buckets.  This is done so that all of the updates in a bucket can be processed in a single transaction.  This allows an efficient implementation of this recipe in Fluo.  It can also lead to efficiency in a system being exported to, if the system can benefit from batching updates.  The number of buckets in an export queue is configurable.
+
+There are three requirements for using this recipe :
+
+ * Must configure export queues before initializing a Fluo application.
+ * Transactions adding to an export queue must get an instance of the queue using its unique QID.
+ * Must implement a class that extends [Exporter][1] in order to process exports.
+
+## Schema
+
+Each export queue stores its data in the Fluo table in a contiguous row range.
+This row range is defined by using the export queue id as a row prefix for all
+data in the export queue.  So the row range defined by the export queue id
+should not be used by anything else.
+
+All data stored in an export queue is [transient](transient.md). When an export
+queue is configured, it will recommend split points using the [table
+optimization process](table-optimization.md).  The number of splits generated
+by this process can be controlled by setting the number of buckets per tablet
+when configuring an export queue.
+
+## Example Use
+
+This example will show how to build an inverted index in an external
+query system using an export queue.  The class below is simple POJO to hold the
+count update, this will be used as the value for the export queue.
+
+```java
+class CountUpdate {
+  public int oldCount;
+  public int newCount;
+  
+  public CountUpdate(int oc, int nc) {
+    this.oldCount = oc;
+    this.newCount = nc;
+  }
+}
+```
+
+The following code shows how to configure an export queue.  This code will
+modify the FluoConfiguration object with options needed for the export queue.
+This FluoConfiguration object should be used to initialize the fluo
+application.
+
+```java
+   FluoConfiguration fluoConfig = ...;
+
+   //queue id "ici" means inverted count index, would probably use static final in real application
+   String exportQueueID = "ici";  
+   Class<CountExporter> exporterType = CountExporter.class;
+   Class<Bytes> keyType = Bytes.class;
+   Class<CountUpdate> valueType = CountUpdate.class;
+   int numBuckets = 1009;
+   //the desired number of tablets to create when applying table optimizations
+   int numTablets = 100;
+
+   ExportQueue.Options eqOptions =
+        new ExportQueue.Options(exportQueueId, exporterType, keyType, valueType, numBuckets)
+          .setsetBucketsPerTablet(numBuckets/numTablets);
+   ExportQueue.configure(fluoConfig, eqOptions);
+
+   //initialize Fluo using fluoConfig
+```
+
+Below is updated version of the observer from above thats now using the export
+queue.
+
+```java
+public class MyObserver extends AbstractObserver {
+
+    private static final TYPEL = new TypeLayer(new StringEncoder());
+
+    private ExportQueue<Bytes, CountUpdate> exportQueue;
+
+    @Override
+    public void init(Context context) throws Exception {
+      exportQueue = ExportQueue.getInstance("ici", context.getAppConfiguration());
+    }
+
+    @Override
+    public void process(TransactionBase tx, Bytes row, Column col) {
+
+        TypedTransactionBase ttx = TYPEL.wrap(tx);
+        int oldCount = ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
+        int numUpdates = ttx.get().row(row).fam("meta").qual("numUpdates").toInteger(0);
+        int newCount = oldCount + numUpdates;
+        ttx.mutate().row(row).fam("meta").qual("counter1").set(newCount);
+        ttx.mutate().row(row).fam("meta").qual("numUpdates").set(0);
+
+        //Because the update to the export queue is part of the transaction,
+        //either the update to meta:counter1 is made and an entry is added to
+        //the export queue or neither happens.
+        exportQueue.add(tx, row, new CountUpdate(oldCount, newCount));
+    }
+}
+```
+
+The observer setup for the export queue will call the `processExports()` method
+on the class below to process entries in the export queue.  It possible the
+call to `processExports()` can fail part way through and/or be called multiple
+times.  In the case of failures the exporter will eventually be called again
+with the exact same data.  The possibility of the same export entry being
+processed on multiple computer at different times can cause exports to arrive
+out of order.  The system receiving exports has to be resilient to data
+arriving out of order and multiple times.  The purpose of the sequence number
+is to help systems receiving data from Fluo process out of order and redundant
+data.
+
+```java
+  public class CountExporter extends Exporter<Bytes, CountUpdate> {
+    //represents the external query system we want to update from Fluo
+    QuerySystem querySystem;
+
+    @Override
+    protected void processExports(Iterator<SequencedExport<Bytes, CountUpdate>> exportIterator) {
+      BatchUpdater batchUpdater = querySystem.getBatchUpdater();
+      while(exportIterator.hasNext()){
+        SequencedExport<Bytes, CountUpdate> exportEntry =  exportItertor.next();
+        Bytes row =  exportEntry.getKey();
+        UpdateCount uc = exportEntry.getValue();
+        long seqNum = exportEntry.getSequence();
+
+        String oldCountRow = String.format("%06d", uc.oldCount);
+        String newCountRow = String.format("%06d", uc.newCount);
+
+        //add a new entry to the inverted index
+        batchUpdater.insertRow(newCountRow, row, seqNum);
+        //remove the old entry from the inverted index
+        batchUpdater.deleteRow(oldCountRow, row, seqNum);
+      }
+
+      //flush all of the updates to the external query system
+      batchUpdater.close();
+    }
+  }
+```
+
+## Concurrency
+
+Additions to the export queue will never collide.  If two transactions add the
+same key at around the same time and successfully commit, then two entries with
+different sequence numbers will always be added to the queue.  The sequence
+number is based on the start timestamp of the transactions.
+
+If the key used to add items to the export queue is deterministically derived
+from something the transaction is writing to, then that will cause a collision.
+For example consider the following interleaving of two transactions adding to
+the same export queue in a manner that will collide. Note, TH1 is shorthand for
+thread 1, ek() is a function the creates the export key, and ev() is a function
+that creates the export value.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`row1`,`fam1:qual1`, val1)
+ 1. TH2 : tx2.set(`row1`,`fam1:qual1`, val2)
+
+In the example above only one transaction will succeed because both are setting
+`row1 fam1:qual1`.  Since adding to the export queue is part of the
+transaction, only the transaction that succeeds will add something to the
+queue.  If the funtion ek() in the example is deterministic, then both
+transactions would have been trying to add the same key to the export queue.
+
+With the above method, we know that transactions adding entries to the queue for
+the same key must have executed [serially][2]. Knowing that transactions which
+added the same key did not overlap in time makes reasoning about those export
+entries very simple.
+
+The example below is a slight modification of the example above.  In this
+example both transactions will successfully add entries to the queue using the
+same key.  Both transactions succeed because they are writing to different
+cells (`rowB fam1:qual2` and `rowA fam1:qual2`).  This approach makes it more
+difficult to reason about export entries with the same key, because the
+transactions adding those entries could have overlapped in time.  This is an
+example of write skew mentioned in the Percolater paper.
+ 
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`rowA`,`fam1:qual2`, val1)
+ 1. TH2 : tx2.set(`rowB`,`fam1:qual2`, val2)
+
+[1]: {{ site.api_static }}/fluo-recipes-core/1.0.0-incubating/apache/fluo/recipes/core/export/Exporter.html
+[2]: https://en.wikipedia.org/wiki/Serializability
+[3]: /docs/fluo-recipes/1.0.0-incubating/accumulo-export-queue/
+

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/index.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/index.md b/docs/fluo-recipes/1.0.0-incubating/index.md
new file mode 100644
index 0000000..9bf6430
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/index.md
@@ -0,0 +1,94 @@
+---
+layout: recipes-doc
+title: Fluo Recipes 1.0.0-incubating Documentation
+version: 1.0.0-incubating
+---
+**Fluo Recipes are common code for [Apache Fluo][fluo] application developers.**
+
+Fluo Recipes build on the [Fluo API][fluo-api] to offer additinal functionality to
+developers. They are published seperately from Fluo on their own release schedule.
+This allows Fluo Recipes to iterate and innovate faster than Fluo (which will maintain
+a more minimal API on a slower release cycle).
+
+### Documentation
+
+Recipes are documented below and in the [Recipes API docs][recipes-api].
+
+* [Collision Free Map][cfm] - A recipe for making many to many updates.
+* [Export Queue][export-q] - A recipe for exporting data from Fluo to external systems.
+* [Row Hash Prefix][row-hasher] - A recipe for spreading data evenly in a row prefix.
+* [RecordingTransaction][recording-tx] - A wrapper for a Fluo transaction that records all transaction
+operations to a log which can be used to export data from Fluo.
+* [Testing][testing] Some code to help write Fluo Integration test.
+
+Recipes have common needs that are broken down into the following reusable components.
+
+* [Serialization][serialization] - Common code for serializing POJOs. 
+* [Transient Ranges][transient] - Standardized process for dealing with transient data.
+* [Table optimization][optimization] - Standardized process for optimizing the Fluo table.
+
+### Usage
+
+The Fluo Recipes project publishes multiple jars to Maven Central for each release.
+The `fluo-recipes-core` jar is the primary jar. It is where most recipes live and where
+they are placed by default if they have minimal dependencies beyond the Fluo API.
+
+Fluo Recipes with dependencies that bring in many transitive dependencies publish
+their own jar. For example, recipes that depend on Apache Spark are published in the
+`fluo-recipes-spark` jar.  If you don't plan on using code in the `fluo-recipes-spark`
+jar, you should avoid including it in your pom.xml to avoid a transitive dependency on
+Spark.
+
+Below is a sample Maven POM containing all possible Fluo Recipes dependencies:
+
+```xml
+  <properties>
+    <fluo-recipes.version>1.0.0-incubating</fluo-recipes.version>
+  </properties>
+
+  <dependencies>
+    <!-- Required. Contains recipes that are only depend on the Fluo API -->
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-recipes-core</artifactId>
+      <version>${fluo-recipes.version}</version>
+    </dependency>
+    <!-- Optional. Serialization code that depends on Kryo -->
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-recipes-kryo</artifactId>
+      <version>${fluo-recipes.version}</version>
+    </dependency>
+    <!-- Optional. Common code for using Fluo with Accumulo -->
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-recipes-accumulo</artifactId>
+      <version>${fluo-recipes.version}</version>
+    </dependency>
+    <!-- Optional. Common code for using Fluo with Spark -->
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-recipes-spark</artifactId>
+      <version>${fluo-recipes.version}</version>
+    </dependency>
+    <!-- Optional. Common code for writing Fluo integration tests -->
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-recipes-test</artifactId>
+      <version>${fluo-recipes.version}</version>
+      <scope>test</scope>
+    </dependency>
+  </dependencies>
+```
+
+[fluo]: https://fluo.apache.org/
+[fluo-api]: https://fluo.apache.org/api/
+[recipes-api]: https://fluo.apache.org/apidocs/fluo-recipes/
+[cfm]: /docs/fluo-recipes/1.0.0-incubating/cfm/
+[export-q]: /docs/fluo-recipes/1.0.0-incubating/export-queue/
+[recording-tx]: /docs/fluo-recipes/1.0.0-incubating/recording-tx/
+[serialization]: /docs/fluo-recipes/1.0.0-incubating/serialization/
+[transient]: /docs/fluo-recipes/1.0.0-incubating/transient/
+[optimization]: /docs/fluo-recipes/1.0.0-incubating/table-optimization/
+[row-hasher]: /docs/fluo-recipes/1.0.0-incubating/row-hasher/
+[testing]: /docs/fluo-recipes/1.0.0-incubating/testing/

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/recording-tx.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/recording-tx.md b/docs/fluo-recipes/1.0.0-incubating/recording-tx.md
new file mode 100644
index 0000000..ee0e62e
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/recording-tx.md
@@ -0,0 +1,72 @@
+---
+layout: recipes-doc
+title: RecordingTransaction recipe
+version: 1.0.0-incubating
+---
+A `RecordingTransaction` is an implementation of `Transaction` that logs all transaction operations
+(i.e GET, SET, or DELETE) to a `TxLog` object for later uses such as exporting data.  The code below
+shows how a RecordingTransaction is created by wrapping a Transaction object:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+```
+
+A predicate function can be passed to wrap method to select which log entries to record.  The code
+below only records log entries whose column family is `meta`:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx,
+                               le -> le.getColumn().getFamily().toString().equals("meta"));
+```
+
+After creating a RecordingTransaction, users can use it as they would use a Transaction object.
+
+```java
+Bytes value = rtx.get(Bytes.of("r1"), new Column("cf1", "cq1"));
+```
+
+While SET or DELETE operations are always recorded to the log, GET operations are only recorded if a
+value was found at the requested row/column.  Also, if a GET method returns an iterator, only the GET
+operations that are retrieved from the iterator are logged.  GET operations are logged as they are
+necessary if you want to determine the changes made by the transaction.
+ 
+When you are done operating on the transaction, you can retrieve the TxLog using the following code:
+
+```java
+TxLog myTxLog = rtx.getTxLog()
+```
+
+Below is example code of how a RecordingTransaction can be used in an observer to record all operations
+performed by the transaction in a TxLog.  In this example, a GET (if data exists) and SET operation
+will be logged.  This TxLog can be added to an export queue and later used to export updates from 
+Fluo.
+
+```java
+public class MyObserver extends AbstractObserver {
+
+    private static final TYPEL = new TypeLayer(new StringEncoder());
+    
+    private ExportQueue<Bytes, TxLog> exportQueue;
+
+    @Override
+    public void process(TransactionBase tx, Bytes row, Column col) {
+
+        // create recording transaction (rtx)
+        RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+        
+        // use rtx to create a typed transaction & perform operations
+        TypedTransactionBase ttx = TYPEL.wrap(rtx);
+        int count = ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
+        ttx.mutate().row(row).fam("meta").qual("counter1").set(count+1);
+        
+        // when finished performing operations, retrieve transaction log
+        TxLog txLog = rtx.getTxLog()
+
+        // add txLog to exportQueue if not empty
+        if (!txLog.isEmpty()) {
+          //do not pass rtx to exportQueue.add()
+          exportQueue.add(tx, row, txLog)
+        }
+    }
+}
+```

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/row-hasher.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/row-hasher.md b/docs/fluo-recipes/1.0.0-incubating/row-hasher.md
new file mode 100644
index 0000000..6a4e565
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/row-hasher.md
@@ -0,0 +1,122 @@
+---
+layout: recipes-doc
+title: Row hash prefix recipe
+version: 1.0.0-incubating
+---
+## Background
+
+Transactions are implemented in Fluo using conditional mutations.  Conditional
+mutations require server side processing on tservers.  If data is not spread
+evenly, it can cause some tservers to execute more conditional mutations than
+others.  These tservers doing more work can become a bottleneck.  Most real
+world data is not uniform and can cause this problem.
+
+Before the Fluo [Webindex example][1] started using this recipe it suffered
+from this problem.  The example was using reverse dns encoded URLs for row keys
+like `p:com.cnn/story1.html`.  This made certain portions of the table more
+popular, which in turn made some tservers do much more work.  This uneven
+distribution of work lead to lower throughput and uneven performance.  Using
+this recipe made those problems go away.
+
+## Solution
+
+This recipe provides code to help add a hash of the row as a prefix of the row.
+Using this recipe rows are structured like the following.
+
+```
+<prefix>:<fixed len row hash>:<user row>
+```
+
+The recipe also provides code to help generate split points and configure
+balancing of the prefix.
+
+## Example Use
+
+```java
+import org.apache.fluo.api.config.FluoConfiguration;
+import org.apache.fluo.api.data.Bytes;
+import org.apache.fluo.recipes.core.data.RowHasher;
+
+public class RowHasherExample {
+
+
+  private static final RowHasher PAGE_ROW_HASHER = new RowHasher("p");
+
+  // Provide one place to obtain row hasher.
+  public static RowHasher getPageRowHasher() {
+    return PAGE_ROW_HASHER;
+  }
+
+  public static void main(String[] args) {
+    RowHasher pageRowHasher = getPageRowHasher();
+
+    String revUrl = "org.wikipedia/accumulo";
+
+    // Add a hash prefix to the row. Use this hashedRow in your transaction
+    Bytes hashedRow = pageRowHasher.addHash(revUrl);
+    System.out.println("hashedRow      : " + hashedRow);
+
+    // Remove the prefix. This can be used by transactions dealing with the hashed row.
+    Bytes orig = pageRowHasher.removeHash(hashedRow);
+    System.out.println("orig           : " + orig);
+
+
+    // Generate table optimizations for the recipe. This can be called when setting up an
+    // application that uses a hashed row.
+    int numTablets = 20;
+
+    // The following code would normally be called before initializing Fluo. This code
+    // registers table optimizations for your prefix+hash.
+    FluoConfiguration conf = new FluoConfiguration();
+    RowHasher.configure(conf, PAGE_ROW_HASHER.getPrefix(), numTablets);
+
+    // Normally you would not call the following code, it would be called automatically for you by
+    // TableOperations.optimizeTable(). Calling this code here to show what table optimization will
+    // be generated.
+    TableOptimizations tableOptimizations = new RowHasher.Optimizer()
+        .getTableOptimizations(PAGE_ROW_HASHER.getPrefix(), conf.getAppConfiguration());
+    System.out.println("Balance config : " + tableOptimizations.getTabletGroupingRegex());
+    System.out.println("Splits         : ");
+    tableOptimizations.getSplits().forEach(System.out::println);
+    System.out.println();
+  }
+}
+```
+
+The example program above prints the following.
+
+```
+hashedRow      : p:1yl0:org.wikipedia/accumulo
+orig           : org.wikipedia/accumulo
+Balance config : (\Qp:\E).*
+Splits         : 
+p:1sst
+p:3llm
+p:5eef
+p:7778
+p:9001
+p:assu
+p:clln
+p:eeeg
+p:g779
+p:i002
+p:jssv
+p:lllo
+p:neeh
+p:p77a
+p:r003
+p:sssw
+p:ullp
+p:weei
+p:y77b
+p:~
+```
+
+The split points are used to create tablets in the Accumulo table used by Fluo.
+Data and computation will spread very evenly across these tablets.  The
+Balancing config will spread the tablets evenly across the tablet servers,
+which will spread the computation evenly. See the [table optimizations][2]
+documentation for information on how to apply the optimizations.
+ 
+[1]: https://github.com/fluo-io/webindex
+[2]: /docs/fluo-recipes/1.0.0-incubating/table-optimization/

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/serialization.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/serialization.md b/docs/fluo-recipes/1.0.0-incubating/serialization.md
new file mode 100644
index 0000000..8c9b3fc
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/serialization.md
@@ -0,0 +1,76 @@
+---
+layout: recipes-doc
+title: Serializing Data
+version: 1.0.0-incubating
+---
+Various Fluo Recipes deal with POJOs and need to serialize them.  The
+serialization mechanism is configurable and defaults to using [Kryo][1].
+
+## Custom Serialization
+
+In order to use a custom serialization method, two steps need to be taken.  The
+first step is to implement [SimpleSerializer][2].  The second step is to
+configure Fluo Recipes to use the custom implementation.  This needs to be done
+before initializing Fluo.  Below is an example of how to do this.
+
+```java
+  FluoConfiguration fluoConfig = ...;
+  //assume MySerializer implements SimpleSerializer
+  SimpleSerializer.setSetserlializer(fluoConfig, MySerializer.class);
+  //initialize Fluo using fluoConfig
+```
+
+## Kryo Factory
+
+If using the default Kryo serializer implementation, then creating a
+KryoFactory implementation can lead to smaller serialization size.  When Kryo
+serializes an object graph, it will by default include the fully qualified
+names of the classes in the serialized data.  This can be avoided by
+[registering classes][3] that will be serialized.  Registration is done by
+creating a KryoFactory and then configuring Fluo Recipes to use it.   The
+example below shows how to do this.
+
+For example assume the POJOs named `Node` and `Edge` will be serialized and
+need to be registered with Kryo.  This could be done by creating a KryoFactory
+like the following.
+
+```java
+
+package com.foo;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.pool.KryoFactory;
+
+import com.foo.data.Edge;
+import com.foo.data.Node;
+
+public class MyKryoFactory implements KryoFactory {
+  @Override
+  public Kryo create() {
+    Kryo kryo = new Kryo();
+    
+    //Explicitly assign each class a unique id here to ensure its stable over
+    //time and in different environments with different dependencies.
+    kryo.register(Node.class, 9);
+    kryo.register(Edge.class, 10);
+    
+    //instruct kryo that these are the only classes we expect to be serialized
+    kryo.setRegistrationRequired(true);
+    
+    return kryo;
+  }
+}
+```
+
+Fluo Recipes must be configured to use this factory.  The following code shows
+how to do this.
+
+```java
+  FluoConfiguration fluoConfig = ...;
+  KryoSimplerSerializer.setKryoFactory(fluoConfig, MyKryoFactory.class);
+  //initialize Fluo using fluoConfig
+```
+
+[1]: https://github.com/EsotericSoftware/kryo
+[2]: {{ site.api_static }}/fluo-recipes-core/1.0.0-incubating/apache/fluo/recipes/core/serialization/SimpleSerializer.html
+[3]: https://github.com/EsotericSoftware/kryo#registration

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/table-optimization.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/table-optimization.md b/docs/fluo-recipes/1.0.0-incubating/table-optimization.md
new file mode 100644
index 0000000..d3d9b43
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/table-optimization.md
@@ -0,0 +1,64 @@
+---
+layout: recipes-doc
+title: Fluo Table Optimization
+version: 1.0.0-incubating
+---
+## Background
+
+Recipes may need to make Accumulo specific table modifications for optimal
+performance.  Configuring the Accumulo tablet balancer and adding splits are
+two optimizations that are currently done.  Offering a standard way to do these
+optimizations makes it easier to use recipes correctly.  These optimizations
+are optional.  You could skip them for integration testing, but would probably
+want to use them in production.
+
+## Java Example
+
+```java
+FluoConfiguration fluoConf = ...
+
+//export queue configure method will return table optimizations it would like made
+ExportQueue.configure(fluoConf, ...);
+
+//CollisionFreeMap.configure() will return table optimizations it would like made
+CollisionFreeMap.configure(fluoConf, ...);
+
+//configure optimizations for a prefixed hash range of a table
+RowHasher.configure(fluoConf, ...);
+
+//initialize Fluo
+FluoFactory.newAdmin(fluoConf).initialize(...)
+
+//Automatically optimize the Fluo table for all configured recipes
+TableOperations.optimizeTable(fluoConf);
+```
+
+[TableOperations][2] is provided in the Accumulo module of Fluo Recipes.
+
+## Command Example
+
+Fluo Recipes provides an easy way to optimize a Fluo table for configured
+recipes from the command line.  This should be done after configuring reciped
+and initializing Fluo.  Below are example command for initializing in this way.
+
+```bash
+
+#create application 
+fluo new app1
+
+#configure application
+
+#initialize Fluo
+fluo init app1
+
+#optimize table for all configured recipes
+fluo exec app1 org.apache.fluo.recipes.accumulo.cmds.OptimizeTable
+```
+
+## Table optimization registry
+
+Recipes register themself by calling [TableOptimizations.registerOptimization()][1].  Anyone can use
+this mechanism, its not limited to use by exisitng recipes.
+
+[1]: {{ site.api_static }}/fluo-recipes-core/1.0.0-incubating/apache/fluo/recipes/core/common/TableOptimizations.html
+[2]: {{ site.api_static }}/fluo-recipes-accumulo/1.0.0-incubating/apache/fluo/recipes/accumulo/ops/TableOperations.html

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/testing.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/testing.md b/docs/fluo-recipes/1.0.0-incubating/testing.md
new file mode 100644
index 0000000..77a604e
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/testing.md
@@ -0,0 +1,14 @@
+---
+layout: recipes-doc
+title: Testing
+version: 1.0.0-incubating
+---
+Fluo includes MiniFluo which makes it possible to write an integeration test that
+runs against a real Fluo instance.  Fluo Recipes provides the following utility
+code for writing an integration test.
+
+ * [FluoITHelper][1] A class with utility methods for comparing expected data with whats in Fluo.
+ * [AccumuloExportITBase][2] A base class for writing an integration test that exports data from Fluo to an external Accumulo table.
+
+[1]: {{ site.api_static }}/fluo-recipes-test/1.0.0-incubating/apache/fluo/recipes/test/FluoITHelper.html
+[2]: {{ site.api_static }}/fluo-recipes-test/1.0.0-incubating/apache/fluo/recipes/test/AccumuloExportITBase.html

http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/blob/cbb726c6/docs/fluo-recipes/1.0.0-incubating/transient.md
----------------------------------------------------------------------
diff --git a/docs/fluo-recipes/1.0.0-incubating/transient.md b/docs/fluo-recipes/1.0.0-incubating/transient.md
new file mode 100644
index 0000000..06029cb
--- /dev/null
+++ b/docs/fluo-recipes/1.0.0-incubating/transient.md
@@ -0,0 +1,83 @@
+---
+layout: recipes-doc
+title: Transient data
+version: 1.0.0-incubating
+---
+## Background
+
+Some recipes store transient data in a portion of the Fluo table.  Transient
+data is data thats continually being added and deleted.  Also these transient
+data ranges contain no long term data.  The way Fluo works, when data is
+deleted a delete marker is inserted but the data is actually still there.  Over
+time these transient ranges of the table will have a lot more delete markers
+than actual data if nothing is done.  If nothing is done, then processing
+transient data will get increasingly slower over time.
+
+These deleted markers can be cleaned up by forcing Accumulo to compact the
+Fluo table, which will run Fluos garbage collection iterator. However,
+compacting the entire table to clean up these ranges within a table is
+overkill. Alternatively,  Accumulo supports compacting ranges of a table.   So
+a good solution to the delete marker problem is to periodically compact just
+the transient ranges. 
+
+Fluo Recipes provides helper code to deal with transient data ranges in a
+standard way.
+
+## Registering Transient Ranges
+
+Recipes like [Export Queue](export-queue.md) will automatically register
+transient ranges when configured.  If you would like to register your own
+transient ranges, use [TransientRegistry][1].  Below is a simple example of
+using this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TransientRegistry transientRegistry = new TransientRegistry(fluoConfig.getAppConfiguration());
+transientRegistry.addTransientRange(new RowRange(startRow, endRow));
+
+//Initialize Fluo using fluoConfig. This will store the registered ranges in
+//zookeeper making them availiable on any node later.
+```
+
+## Compacting Transient Ranges
+
+Although you may never need to register transient ranges directly, you will
+need to periodically compact transient ranges if using a recipe that registers
+them.  Using [TableOperations][2] this can be done with one line of Java code
+like the following.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TableOperations.compactTransient(fluoConfig);
+```
+
+Fluo recipes provides an easy way to compact transient ranges from the command line using the `fluo exec` command as follows:
+
+```
+fluo exec <app name> org.apache.fluo.recipes.accumulo.cmds.CompactTransient [<interval> [<multiplier>]]
+```
+
+If no arguments are specified the command will call `compactTransient()` once.
+If `<interval>` is specified the command will run forever compacting transient
+ranges sleeping `<interval>` seconds between compacting each transient ranges.
+
+In the case where Fluo is backed up in processing data a transient range could
+have a lot of data queued and compacting it too frequently would be
+counterproductive.  To avoid this the `CompactTransient` command will consider
+the time it took to compact a range when deciding when to compact that range
+next.  This is where the `<multiplier>` argument comes in, the time to sleep
+between compactions of a range is determined as follows.  If not specified, the
+multiplier defaults to 3.
+
+```java
+   sleepTime = Math.max(compactTime * multiplier, interval);
+```
+
+For example assume a Fluo application has two transient ranges.  Also assume
+CompactTransient is run with an interval of 600 and a multiplier of 10.  If the
+first range takes 20 seconds to compact, then it will be compacted again in 600
+seconds.  If the second range takes 80 seconds to compact, then it will be
+compacted again in 800 seconds.
+
+[1]: {{ site.api_static }}/fluo-recipes-core/1.0.0-incubating/apache/fluo/recipes/core/common/TransientRegistry.html
+[2]: {{ site.api_static }}/fluo-recipes-accumulo/1.0.0-incubating/apache/fluo/recipes/accumulo/ops/TableOperations.html


Mime
View raw message