fluo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ktur...@apache.org
Subject [5/5] incubator-fluo-website git commit: Updates based on feedback from @joshelser on mailing list
Date Mon, 10 Oct 2016 19:13:31 GMT
Updates based on feedback from @joshelser on mailing list

Project: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/commit/b3592332
Tree: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/tree/b3592332
Diff: http://git-wip-us.apache.org/repos/asf/incubator-fluo-website/diff/b3592332

Branch: refs/heads/gh-pages
Commit: b3592332b02deac09c829f9b51ca9167e7c054fa
Parents: ef3b2b7
Author: Keith Turner <keith@deenlo.com>
Authored: Thu Oct 6 13:57:28 2016 -0400
Committer: Keith Turner <keith@deenlo.com>
Committed: Mon Oct 10 19:10:01 2016 -0400

 resources/tour/RowLocking.png     | Bin 0 -> 59627 bytes
 tour/application-configuration.md |   4 ++--
 tour/exercise-1.md                |  13 +++++++------
 tour/row-locking.md               |  32 +++++++++++++++++++++++++++++---
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/resources/tour/RowLocking.png b/resources/tour/RowLocking.png
new file mode 100644
index 0000000..30308f7
Binary files /dev/null and b/resources/tour/RowLocking.png differ

diff --git a/tour/application-configuration.md b/tour/application-configuration.md
index 143171f..676d2f1 100644
--- a/tour/application-configuration.md
+++ b/tour/application-configuration.md
@@ -5,14 +5,14 @@ title: Application Configuration
 Fluo applications are distributed applications where code is running on many separate machines.
 Getting configuration to these distributed processes can be tricky and cumbersome.  Fluo
 two simple mechanisms to assists with this: application configuration and observer configuration.
-This configuration data is stored in zookeeper when an application is initialized.  After
+This configuration data is stored in ZooKeeper when an application is initialized.  After
 initialization any Fluo client or Observer can access the configuration.
 ## Application Configuration
 To use application configuration, set properties with the prefix `fluo.app` in your configuration
 file before initialization.  Alternatively use [FluoConfiguration.getAppConfiguration()][fcogac]
-set these properties programmatically.  After fluo is initialized this information can be
+set these properties programmatically.  After Fluo is initialized this information can be
 anywhere by calling [FluoClient.getAppConfiguration()][fclgac],
 [Observer.Context.getAppConfigurtaion()][ocgac], or [Loader.Context.getAppConfiguration()][lcgac].

diff --git a/tour/exercise-1.md b/tour/exercise-1.md
index 2c29aba..a5f2cdc 100644
--- a/tour/exercise-1.md
+++ b/tour/exercise-1.md
@@ -2,8 +2,12 @@
 title: Word count Exercise
-This exercise will show you how to create a simple system that computes word counts for unique
-documents. This system should do the following.
+This excercise gives you an opportunity to use everything you have learned so
+far to attempt writing a simple Fluo application.  A bare minimum of code,
+along with a conceptual sketch of a solution, is provided to get you started.
+The application should compute word counts for unique documents. This
+application should do the following.
  * Deduplicate content based on hash
  * Count how many URIs reference content
@@ -14,9 +18,6 @@ documents. This system should do the following.
  * Partition different types of data using row prefixes.  Use *u:* for URIs, use *d:* for
    content, and use *w:* for word counts.
-A skeleton of the code needed to implement this exercise is provided below along with some
-test data.
 ## Part 1 : Loading data.
 The class below is a simple POJO for documents.
@@ -400,4 +401,4 @@ After implementing the Observer, the output of the program should look
like the
 The way to compute word counts above is very prone to transactional collisions. One way to
 these collisions is to use the CollisionFreeMap provided in Fluo Recipes. Currently Fluo
Recipes is
-not released, this section will be updated with more information once it is.
\ No newline at end of file
+not released, this section will be updated with more information once it is.

diff --git a/tour/row-locking.md b/tour/row-locking.md
index a105611..b879822 100644
--- a/tour/row-locking.md
+++ b/tour/row-locking.md
@@ -5,16 +5,41 @@ title: Row Locking
 Fluo relies on Accumulo's conditional mutations to implement cross node
 transactions.  Conditional mutations lock entire rows on the server side when
 checking conditions.  These row locks can impact the performance of your
-transactions, so its something to be aware of when designing a schema.
+transactions, so it's something to be aware of when designing a schema.
+For example, the following illustration shows multiple Fluo clients executing
+transactions.  These transactions update different columns in the same row.
+The transactions will not collide, however they may end up waiting on each
+other because Accumulo locks `Row 1` to process each update.   
+<!-- source for figure : https://docs.google.com/drawings/d/1CpUBE5kEGHoZUCUdO9MMyHksgZHylAUbQVJYlrp-DF0/edit?usp=sharing
+Determining whether this problem will impact you depends on your schema and the
+probability of concurrent updates.  Mitigating action is only needed if the
+following criteria are met.
+ * Many transactions will update separate columns in a row.
+ * Those transactions are very likely to run concurrently.
+If both of the conditions above are met then transactions will likely wait
+unnecessarily.  One simple way to avoid the wait is to move some of the
+information that was in the column into the row.  In the example above the
+information in the column could be appended to the row.  Then the transactions
+would be updating rows `Row 1:U`, `Row 1:V`, `Row 1:W`, and `Row 1:X`.  Since
+these are separate rows, lock contention is avoided in Accumulo tablet servers.
+## Example
 The following code demonstrate the impact of schema design on performance. The
 code adds lots of edges to a single node in a graph using many transactions and
 threads. All of the edges are added to a single row.
 These performance problems may not occur on a single node with a single client,
 because Fluo clients batch a lot of operations related to committing.  To make
-the problem more apparent, the following code creates three clients and three
+the problem more apparent on a single node, the following code creates three
+clients and three loaders.
   public static class EdgeLoader implements Loader {
@@ -75,4 +100,5 @@ the time it takes.  The change below spreads the edges over many rows.
+[fig1]: /resources/tour/RowLocking.png

View raw message