Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 561DC200D06 for ; Mon, 25 Sep 2017 21:13:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 54B471609BB; Mon, 25 Sep 2017 19:13:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C320B1609B5 for ; Mon, 25 Sep 2017 21:13:21 +0200 (CEST) Received: (qmail 29323 invoked by uid 500); 25 Sep 2017 19:13:21 -0000 Mailing-List: contact commits-help@fluo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@fluo.apache.org Delivered-To: mailing list commits@fluo.apache.org Received: (qmail 29314 invoked by uid 99); 25 Sep 2017 19:13:20 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Sep 2017 19:13:20 +0000 Received: by gitbox.apache.org (ASF Mail Server at gitbox.apache.org, from userid 33) id 81BE281788; Mon, 25 Sep 2017 19:13:18 +0000 (UTC) Date: Mon, 25 Sep 2017 19:13:18 +0000 To: "commits@fluo.apache.org" Subject: [fluo] branch master updated: Fixes #925 - Move Fluo documentation to project website (#926) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-ID: <150636679863.4160.12042425846497145449@gitbox.apache.org> From: mwalch@apache.org Reply-To: "commits@fluo.apache.org" X-Git-Host: gitbox.apache.org X-Git-Repo: fluo X-Git-Refname: refs/heads/master X-Git-Reftype: branch X-Git-Oldrev: 00befd8dfeebe47b2edb8105a86fc8b7b069b92d X-Git-Newrev: 963be6326f10e6336138d3936613111bff2af27b X-Git-Rev: 963be6326f10e6336138d3936613111bff2af27b X-Git-NotificationType: ref_changed_plus_diff X-Git-Multimail-Version: 1.5.dev Auto-Submitted: auto-generated archived-at: Mon, 25 Sep 2017 19:13:23 -0000 This is an automated email from the ASF dual-hosted git repository. mwalch pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/fluo.git The following commit(s) were added to refs/heads/master by this push: new 963be63 Fixes #925 - Move Fluo documentation to project website (#926) 963be63 is described below commit 963be6326f10e6336138d3936613111bff2af27b Author: Mike Walch AuthorDate: Mon Sep 25 15:13:16 2017 -0400 Fixes #925 - Move Fluo documentation to project website (#926) --- docs/contributing.md => CONTRIBUTING.md | 0 README.md | 28 +-- docs/applications.md | 326 ------------------------- docs/architecture.md | 56 ----- docs/grafana.md | 97 -------- docs/install.md | 128 ---------- docs/metrics.md | 130 ---------- docs/resources/fluo-architecture.odg | Bin 16670 -> 0 bytes docs/resources/fluo-architecture.png | Bin 61085 -> 0 bytes modules/distribution/src/main/assembly/bin.xml | 4 - 10 files changed, 3 insertions(+), 766 deletions(-) diff --git a/docs/contributing.md b/CONTRIBUTING.md similarity index 100% rename from docs/contributing.md rename to CONTRIBUTING.md diff --git a/README.md b/README.md index c6c1238..46d0e9a 100644 --- a/README.md +++ b/README.md @@ -29,38 +29,16 @@ of a large-scale computation, index, or analytic as new data is discovered. Chec ## Getting Started * Take the [Fluo Tour][tour] if you are completely new to Fluo. -* Read the [install instructions][install] to install Fluo and start a Fluo application in YARN on a - cluster where Accumulo, Hadoop & Zookeeper are running. If you need help setting up these +* Read the [Fluo documentation][fluo-docs] to learn how to install Fluo and start a Fluo application + on a cluster where Accumulo, Hadoop & Zookeeper are running. If you need help setting up these dependencies, see the [related projects page][related] for external projects that may help. -## Applications - -Below are helpful resources for Fluo application developers: - -* [Instructions][apps] for creating Fluo applications -* [Fluo API][api] javadocs -* [Fluo Recipes][recipes] is a project that provides common code for Fluo application developers - implemented using the Fluo API. - -## Implementation - -* [Architecture] - Overview of Fluo's architecture -* [Contributing] - Documentation for developers who want to contribute to Fluo -* [Metrics] - Fluo metrics are visible via JMX by default but can be configured to send to Graphite - or Ganglia - [fluo]: https://fluo.apache.org/ [related]: https://fluo.apache.org/related-projects/ [tour]: https://fluo.apache.org/tour/ [accumulo]: https://accumulo.apache.org [percolator]: https://research.google.com/pubs/pub36726.html -[install]: docs/install.md -[apps]: docs/applications.md -[api]: https://fluo.apache.org/apidocs/ -[recipes]: https://github.com/apache/fluo-recipes -[Metrics]: docs/metrics.md -[Contributing]: docs/contributing.md -[Architecture]: docs/architecture.md +[fluo-docs]: https://fluo.apache.org/docs/ [ti]: https://travis-ci.org/apache/fluo.svg?branch=master [tl]: https://travis-ci.org/apache/fluo [li]: http://img.shields.io/badge/license-ASL-blue.svg diff --git a/docs/applications.md b/docs/applications.md deleted file mode 100644 index cc2001b..0000000 --- a/docs/applications.md +++ /dev/null @@ -1,326 +0,0 @@ - - -# Fluo Applications - -Once you have Fluo installed and running on your cluster, you can run Fluo applications consisting -of [clients and observers](architecture.md). This documentations will shows how to : - - * Create a Fluo client - * Create a Fluo observer - * Initialize a Fluo Application - * Start and stop a Fluo application (which consists of Oracle and Worker processes) - -## Fluo Maven Dependencies - -For both clients and observers, you will need to include the following in your Maven pom: - -```xml - - org.apache.fluo - fluo-api - 1.1.0-incubating - - - org.apache.fluo - fluo-core - 1.1.0-incubating - runtime - -``` - -Fluo provides a classpath command to help users build a runtime classpath. This command along with -the `hadoop jar` command is useful when writing scripts to run Fluo client code. These commands -allow the scripts to use the versions of Hadoop, Accumulo, and Zookeeper installed on a cluster. - -## Creating a Fluo client - -To create a [FluoClient], you will need to provide it with a [FluoConfiguration] object that is -configured to connect to your Fluo instance. - -If you have access to the [fluo-conn.properties] file that was used to configure your Fluo instance, you -can use it to build a [FluoConfiguration] object with all necessary properties: - -```java -FluoConfiguration config = new FluoConfiguration(new File("fluo-conn.properties")); -config.setApplicationName("myapp"); -``` - -You can also create an empty [FluoConfiguration] object and set properties using Java: - -```java -FluoConfiguration config = new FluoConfiguration(); -config.setInstanceZookeepers("localhost/fluo"); -config.setApplicationName("myapp"); -``` - -Once you have [FluoConfiguration] object, pass it to the `newClient()` method of [FluoFactory] to -create a [FluoClient]: - -```java -try(FluoClient client = FluoFactory.newClient(config)){ - - try (Transaction tx = client.newTransaction()) { - // read and write some data - tx.commit(); - } - - try (Snapshot snapshot = client.newSnapshot()) { - //read some data - } -} -``` - -It may help to reference the [API javadocs][API] while you are learning the Fluo API. - -## Creating a Fluo observer - -To create an observer, follow these steps: - -1. Create one or more classes that extend [Observer] like the example below. Please use [slf4j] for - any logging in observers as [slf4j] supports multiple logging implementations. This is - necessary as Fluo applications have a hard requirement on [logback] when running in YARN. - - ```java - public class InvertObserver implements Observer { - - @Override - public void process(TransactionBase tx, Bytes row, Column col) throws Exception { - // read value - Bytes value = tx.get(row, col); - // invert row and value - tx.set(value, new Column("inv", "data"), row); - } - } - ``` - -2. Create a class that implements [ObserverProvider] like the example below. The purpose of this - class is associate a set Observers with columns that trigger the observers. The class can - register multiple observers. - - ```java - class AppObserverProvider implements ObserverProvider { - @Override - public void provide(Registry or, Context ctx) { - //setup InvertObserver to be triggered when the column obs:data is modified - or.forColumn(new Column("obs", "data"), NotificationType.STRONG) - .useObserver(new InvertObserver()); - - //Observer is a Functional interface. So Observers can be written as lambdas. - or.forColumn(new Column("new","data"), NotificationType.WEAK) - .useObserver((tx,row,col) -> { - Bytes combined = combineNewAndOld(tx,row); - tx.set(row, new Column("current","data"), combined); - }); - } - } - ``` - -3. Build a jar containing these classes and include this jar in the `lib/` directory of your Fluo - application. -4. Configure your Fluo application to use this observer provider by modifying the Application section of - [fluo-app.properties]. Set `fluo.observer.provider` to the observer provider class name. -5. Initialize your Fluo application as described in the next section. During initialization Fluo - will obtain the observed columns from the ObserverProvider and persist the columns in Zookeeper. - These columns persisted in Zookeeper are used by transactions to know when to trigger observers. - -## Initializing a Fluo Application - -Before a Fluo Application can run, it must be initiaized. Below is an overview of what -initialization does and some of the properties that may be set for initialization. - - * **Initialize ZooKeeper** : Each application has its own area in ZooKeeper used for configuration, - Oracle state, and worker coordination. All properties, except `fluo.connections.*`, are copied - into ZooKeeper. For example, if `fluo.worker.num.threads=128` was set, then when a worker process - starts it will read this from ZooKeeper. - * **Copy Observer jars to DFS** : Fluo workers processes need the jars containing observers. These - are provided in one of the following ways. - * Set the property `fluo.observer.init.dir` to a local directory containing observer jars. The - jars in this directory are copied to DFS under `/`. When a worker is - started, the jars are pulled from DFS and added to its classpath. - * Set the property `fluo.observer.jars.url` to a directory in DFS containing observer jars. No - copying is done. When a worker is started, the jars are pulled from this location and added to - its classpath. - * Do not set any of the properties above and have the mechanism that starts the worker process - add the needed jars to the classpath. - * **Create Accumulo table** : Each Fluo application creates and configures an Accumulo table. The - `fluo.accumulo.*` properties determine which Accumulo instance is used. For performance reasons, - Fluo runs its own code in Accumulo tablet servers. Fluo attempts to copy Fluo jars into DFS and - configure Accumulo to use them. Fluo first checks the property `fluo.accumulo.jars` and if set, - copies the jars listed there. If that property is not set, then Fluo looks on the classpath to - find jars. Jars are copied to a location under `/`. - -Below are the steps to initialize an application from the command line. It is also possible to -initialize an application using Fluo's Java API. - -1. Create a copy of [fluo-app.properties] for your Fluo application. - - cp $FLUO_HOME/conf/fluo-app.properties /path/to/myapp/fluo-app.properties - -2. Edit your copy of [fluo-app.properties] and make sure to set the following: - - * Class name of your ObserverProvider - * Paths to your Fluo observer jars - * Accumulo configuration - * DFS configuration - - When configuring the observer section of fluo-app.properties, you can configure your instance for the - [phrasecount] application if you have not created your own application. See the [phrasecount] - example for instructions. You can also choose not to configure any observers but your workers will - be idle when started. - -3. Run the command below to initialize your Fluo application. Change `myapp` to your application name: - - fluo init myapp /path/to/myapp/fluo-app.properties - - A Fluo application only needs to be initialized once. After initialization, the Fluo application - name is used to start/stop the application and scan the Fluo table. - -4. Run `fluo list` which connects to Fluo and lists applications to verify initialization. - -5. Run `fluo config myapp` to see what configuration is stored in ZooKeeper. - -## Starting your Fluo application - -Follow the instructions below to start a Fluo application which contains an oracle and multiple workers. - -1. Configure [fluo-env.sh] and [fluo-conn.properties] if you have not already. - -2. Run Fluo application processes using the `fluo oracle` and `fluo worker` commands. Fluo applications - are typically run with one oracle process and multiple worker processes. The commands below will start - a Fluo oracle and two workers on your local machine: - - fluo oracle myapp &> oracle.log & - fluo worker myapp &> worker1.log & - fluo worker myapp &> worker2.log & - - The commands will retrieve your application configuration and observer jars (using your - application name) before starting the oracle or worker process. - -If you want to distribute the processes of your Fluo application across a cluster, you will need install -Fluo on every node where you want to run a Fluo process and follow the instructions above on each node. - -## Managing your Fluo application - -When you have data in your Fluo application, you can view it using the command `fluo scan myapp`. -Pipe the output to `less` using the command `fluo scan myapp | less` if you want to page through the data. - -To list all Fluo applications, run `fluo list`. - -To stop your Fluo application, run `jps -m | grep Fluo` to find process IDs and use `kill` to stop them. - -## Running application code - -The `fluo exec {arguments}` provides an easy way to execute application code. It -will execute a class with a main method if a jar containing the class is included with the observer -jars configured at initialization. When the class is run, Fluo classes and dependencies will be on -the classpath. The `fluo exec` command can inject the applications configuration if the class is -written in the following way. Defining the injection point is optional. - -```java -import javax.inject.Inject; - -public class AppCommand { - - //when run with fluo exec command, the applications configuration will be injected - @Inject - private static FluoConfiguration fluoConfig; - - public static void main(String[] args) throws Exception { - try(FluoClient fluoClient = FluoFactory.newClient(fluoConfig)) { - //do stuff with Fluo - } - } -} -``` - -## Application Configuration - -For configuring observers, fluo provides a simple mechanism to set and access application specific -configuration. See the javadoc on [FluoClient].getAppConfiguration() for more details. - -## Debugging Applications - -While monitoring [Fluo metrics][metrics] can detect problems (like too many transaction collisions) -in a Fluo application, [metrics][metrics] may not provide enough information to debug the root cause -of the problem. To help debug Fluo applications, low-level logging of transactions can be turned on -by setting the following loggers to TRACE: - -| Logger | Level | Information | -|----------------------|-------|----------------------------------------------------------------------------------------------------| -| fluo.tx | TRACE | Provides detailed information about what transactions read and wrote | -| fluo.tx.summary | TRACE | Provides a one line summary about each transaction executed | -| fluo.tx.collisions | TRACE | Provides details about what data was involved When a transaction collides with another transaction | -| fluo.tx.scan | TRACE | Provides logging for each cell read by a scan. Scan summary logged at `fluo.tx` level. This allows suppression of `fluo.tx.scan` while still seeing summary. | - -Below is an example log after setting `fluo.tx` to TRACE. The number following `txid: ` is the -transactions start timestamp from the Oracle. - -``` -2015-02-11 18:24:05,341 [fluo.tx ] TRACE: txid: 3 begin() thread: 198 -2015-02-11 18:24:05,343 [fluo.tx ] TRACE: txid: 3 class: com.SimpleLoader -2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 3 get(4333, stat count ) -> null -2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 3 set(4333, stat count , 1) -2015-02-11 18:24:05,441 [fluo.tx ] TRACE: txid: 3 commit() -> SUCCESSFUL commitTs: 4 -2015-02-11 18:24:05,341 [fluo.tx ] TRACE: txid: 5 begin() thread: 198 -2015-02-11 18:24:05,442 [fluo.tx ] TRACE: txid: 3 close() -2015-02-11 18:24:05,343 [fluo.tx ] TRACE: txid: 5 class: com.SimpleLoader -2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 5 get(4333, stat count ) -> 1 -2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 5 set(4333, stat count , 2) -2015-02-11 18:24:05,441 [fluo.tx ] TRACE: txid: 5 commit() -> SUCCESSFUL commitTs: 6 -2015-02-11 18:24:05,442 [fluo.tx ] TRACE: txid: 5 close() -``` - -The log above traces the following sequence of events. - -* Transaction T1 has a start timestamp of `3` -* Thread with id `198` is executing T1, its running code from the class `com.SimpleLoader` -* T1 reads row `4333` and column `stat count` which does not exist -* T1 sets row `4333` and column `stat count` to `1` -* T1 commits successfully and its commit timestamp from the Oracle is `4`. -* Transaction T2 has a start timestamp of `5` (because its `5` > `4` it can see what T1 wrote). -* T2 reads a value of `1` for row `4333` and column `stat count` -* T2 sets row `4333` and `column `stat count` to `2` -* T2 commits successfully with a commit timestamp of `6` - -Below is an example log after only setting `fluo.tx.collisions` to TRACE. This setting will only log -trace information when a collision occurs. Unlike the previous example, what the transaction read -and wrote is not logged. This shows that a transaction with a start timestamp of `106` and a class -name of `com.SimpleLoader` collided with another transaction on row `r1` and column `fam1 qual1`. - -``` -2015-02-11 18:17:02,639 [tx.collisions] TRACE: txid: 106 class: com.SimpleLoader -2015-02-11 18:17:02,639 [tx.collisions] TRACE: txid: 106 collisions: {r1=[fam1 qual1 ]} -``` - -When applications read and write arbitrary binary data, this does not log so well. In order to make -the trace logs human readable, non ASCII chars are escaped using hex. The convention used it `\xDD` -where D is a hex digit. Also the `\` character is escaped to make the output unambiguous. - -[FluoFactory]: ../modules/api/src/main/java/org/apache/fluo/api/client/FluoFactory.java -[FluoClient]: ../modules/api/src/main/java/org/apache/fluo/api/client/FluoClient.java -[FluoConfiguration]: ../modules/api/src/main/java/org/apache/fluo/api/config/FluoConfiguration.java -[Observer]: ../modules/api/src/main/java/org/apache/fluo/api/observer/Observer.java -[ObserverProvider]: ../modules/api/src/main/java/org/apache/fluo/api/observer/ObserverProvider.java -[fluo-conn.properties]: ../modules/distribution/src/main/config/fluo-conn.properties -[fluo-app.properties]: ../modules/distribution/src/main/config/fluo-app.properties -[API]: https://fluo.apache.org/apidocs/ -[metrics]: metrics.md -[slf4j]: http://www.slf4j.org/ -[logback]: http://logback.qos.ch/ -[phrasecount]: https://github.com/fluo-io/phrasecount -[fluo-env.sh]: ../modules/distribution/src/main/config/fluo-env.sh diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index 197a691..0000000 --- a/docs/architecture.md +++ /dev/null @@ -1,56 +0,0 @@ - - -# Fluo Architecture - -![fluo-architecture][1] - -## Fluo Application - -A **Fluo application** maintains a large scale computation using a series of small transactional -updates. Fluo applications store their data in a **Fluo table** which has a similar structure (row, -column, value) to an **Accumulo table** except that a Fluo table has no timestamps. A Fluo table -is implemented using an Accumulo table. While you could scan the Accumulo table used to implement -a Fluo table using an Accumulo client, you would read extra implementation-related data in addition -to your data. Therefore, developers should only interact with the data in a Fluo table by writing -Fluo client or observer code: - -* **Clients** ingest data or interact with Fluo from external applications (REST services, - crawlers, etc). These are generally user started process that use the Fluo API. -* **Observers** are user provided functions run by Fluo Workers that execute transactions in response to notifications. Notifications are set by Fluo transactions, executing in a client or observer, when a requested column is modified. - -Multiple Fluo applications can run on a cluster at the same time. Fluo applications -consist of an oracle process and a configurable number of worker processes: - -* The **Oracle** process allocates timestamps for transactions. While only one Oracle is required, - Fluo can be configured to run extra Oracles that can take over if the primary Oracle fails. -* **Worker** processes run user code (called **observers**) that perform transactions. All workers - run the same observers. The number of worker instances are configured to handle the processing - workload. - -## Fluo Dependencies - -Fluo requires the following software to be running on the cluster: - -* **Accumulo** - Fluo stores its data in Accumulo and uses Accumulo's conditional mutations for - transactions. -* **Hadoop** - Each Fluo application run its oracle and worker processes as Hadoop YARN - applications. HDFS is also required for Accumulo. -* **Zookeeper** - Fluo stores its metadata and state information in Zookeeper. Zookeeper is also - required for Accumulo. - -[1]: resources/fluo-architecture.png diff --git a/docs/grafana.md b/docs/grafana.md deleted file mode 100644 index 2b6d79c..0000000 --- a/docs/grafana.md +++ /dev/null @@ -1,97 +0,0 @@ - - -# Fluo metrics in Grafana/InfluxDB - -This document describes how to send Fluo metrics to [InfluxDB], a time series database, and make -them viewable in [Grafana], a visualization tool. If you want general information on metrics, see -the [Fluo metrics][2] documentation. - -## Set up Grafana/InfluxDB on your own - -Follow the instructions below to setup InfluxDB and Grafana. - -1. Follow the standard installation instructions for [InfluxDB] and [Grafana]. As for versions, - the instructions below were written using InfluxDB v0.9.4.2 and Grafana v2.5.0. - -2. Add the following to your InfluxDB configuration to configure it accept metrics in Graphite - format from Fluo. The configuration below contains templates that transform the Graphite - metrics into a format that is usable in InfluxDB. - - ``` - [[graphite]] - bind-address = ":2003" - enabled = true - database = "fluo_metrics" - protocol = "tcp" - consistency-level = "one" - separator = "_" - batch-size = 1000 - batch-pending = 5 - batch-timeout = "1s" - templates = [ - "fluo.class.*.*.*.*.* ..app.host.measurement.observer.field", - "fluo.class.*.*.*.* ..app.host.measurement.observer", - "fluo.system.*.*.*.* ..app.host.measurement.field", - "fluo.system.*.*.* ..app.host.measurement", - "fluo.app.*.*.* ..host.measurement.field", - "fluo.app.*.* ..host.measurement", - ] - ``` - -3. Fluo distributes a file called `fluo_metrics_setup.txt` that contains a list of commands that - setup InfluxDB. These commands will configure an InfluxDB user, retention policies, and - continuous queries that downsample data for the historical dashboard in Grafana. Run the command - below to execute the commands in this file: - - ``` - $INFLUXDB_HOME/bin/influx -import -path $FLUO_HOME/contrib/influxdb/fluo_metrics_setup.txt - ``` - -3. Configure the `fluo-app.properties` of your Fluo application to send Graphite metrics to InfluxDB. - Below is example configuration. Remember to replace `` with the actual host. - - ``` - fluo.metrics.reporter.graphite.enable=true - fluo.metrics.reporter.graphite.host= - fluo.metrics.reporter.graphite.port=2003 - fluo.metrics.reporter.graphite.frequency=30 - ``` - - The reporting frequency of 30 sec is required if you are using the provided Grafana dashboards - that are configured in the next step. - -4. Grafana needs to be configured to load dashboard JSON templates from a directory. Fluo - distributes two Grafana dashboard templates in its tarball distribution in the directory - `contrib/grafana`. Before restarting Grafana, you should copy the templates from your Fluo - installation to the `dashboards/` directory configured below. - - ``` - [dashboards.json] - enabled = true - path = /dashboards - ``` - -5. If you restart Grafana, you will see the Fluo dashboards configured but all of their charts will - be empty unless you have a Fluo application running and configured to send data to InfluxDB. - When you start sending data, you may need to refresh the dashboard page in the browser to start - viewing metrics. - -[1]: https://dropwizard.github.io/metrics/3.1.0/ -[2]: metrics.md -[Grafana]: http://grafana.org/ -[InfluxDB]: https://influxdb.com/ diff --git a/docs/install.md b/docs/install.md deleted file mode 100644 index 2341148..0000000 --- a/docs/install.md +++ /dev/null @@ -1,128 +0,0 @@ - - -# Fluo Install Instructions - -Instructions for installing Apache Fluo and starting a Fluo application on a cluster where -Accumulo, Hadoop & Zookeeper are running. If you need help setting up these dependencies, see the -[related projects page][related] for external projects that may help. - -## Requirements - -Before you install Fluo, the following software must be installed and running on your local machine -or cluster: - -| Software | Recommended Version | Minimum Version | -|-------------|---------------------|-----------------| -| [Accumulo] | 1.7.2 | 1.6.1 | -| [Hadoop] | 2.7.2 | 2.6.0 | -| [Zookeeper] | 3.4.8 | | -| [Java] | JDK 8 | JDK 8 | - -## Obtain a distribution - -Before you can install Fluo, you will need to obtain a distribution tarball. It is recommended that -you download the [latest release][release]. You can also build a distribution from the master -branch by following these steps which create a tarball in `modules/distribution/target`: - - git clone https://github.com/apache/fluo.git - cd fluo/ - mvn package - -## Install Fluo - -After you obtain a Fluo distribution tarball, follow these steps to install Fluo. - -1. Choose a directory with plenty of space and untar the distribution: - - tar -xvzf fluo-1.1.0-incubating-bin.tar.gz - cd fluo-1.1.0-incubating - - The distribution contains a `fluo` script in `bin/` that administers Fluo and the - following configuration files in `conf/`: - - | Configuration file | Description | - |------------------------------|----------------------------------------------------------------------------------------------| - | [fluo-env.sh] | Configures classpath for `fluo` script. Required for all commands. | - | [fluo-conn.properties] | Configures connection to Fluo. Required for all commands. | - | [fluo-app.properties] | Template for configuration file passed to `fluo init` when initializing Fluo application. | - | [log4j.properties] | Configures logging | - | [fluo.properties.deprecated] | Deprecated Fluo configuration file. Replaced by fluo-conn.properties and fluo-app.properties | - -2. Configure [fluo-env.sh] to set up your classpath using jars from the versions of Hadoop, Accumulo, and -Zookeeper that you are using. Choose one of the two ways below to make these jars available to Fluo: - - * Set `HADOOP_PREFIX`, `ACCUMULO_HOME`, and `ZOOKEEPER_HOME` in your environment or configure - these variables in [fluo-env.sh]. Fluo will look in these locations for jars. - * Run `./lib/fetch.sh ahz` to download Hadoop, Accumulo, and Zookeeper jars to `lib/ahz` and - configure [fluo-env.sh] to look in this directory. By default, this command will download the - default versions set in [lib/ahz/pom.xml]. If you are not using the default versions, you can - override them: - - ./lib/fetch.sh ahz -Daccumulo.version=1.7.2 -Dhadoop.version=2.7.2 -Dzookeeper.version=3.4.8 - -3. Fluo needs more dependencies than what is available from Hadoop, Accumulo, and Zookeeper. These - extra dependencies need to be downloaded to `lib/` using the command below: - - ./lib/fetch.sh extra - -You are now ready to use the `fluo` script. - -## Fluo command script - -The Fluo command script is located at `bin/fluo` of your Fluo installation. All Fluo commands are -invoked by this script. - -Modify and add the following to your `~/.bashrc` if you want to be able to execute the fluo script -from any directory: - - export PATH=/path/to/fluo-1.1.0-incubating/bin:$PATH - -Source your `.bashrc` for the changes to take effect and test the script - - source ~/.bashrc - fluo - -Running the script without any arguments prints a description of all commands. - - ./bin/fluo - -## Tuning Accumulo - -Fluo will reread the same data frequently when it checks conditions on mutations. When Fluo -initializes a table it enables data caching to make this more efficient. However you may need to -increase the amount of memory available for caching in the tserver by increasing -`tserver.cache.data.size`. Increasing this may require increasing the maximum tserver java heap size -in `accumulo-env.sh`. - -Fluo will run many client threads, will want to ensure the tablet server has enough threads. Should -probably increase the `tserver.server.threads.minimum` Accumulo setting. - -Using at least Accumulo 1.6.1 is recommended because multiple performance bugs were fixed. - -[Accumulo]: https://accumulo.apache.org/ -[Hadoop]: http://hadoop.apache.org/ -[Zookeeper]: http://zookeeper.apache.org/ -[Java]: http://openjdk.java.net/ -[related]: https://fluo.apache.org/related-projects/ -[release]: https://fluo.apache.org/download/ -[fluo-conn.properties]: ../modules/distribution/src/main/config/fluo-conn.properties -[fluo-app.properties]: ../modules/distribution/src/main/config/fluo-app.properties -[log4j.properties]: ../modules/distribution/src/main/config/log4j.properties -[fluo.properties.deprecated]: ../modules/distribution/src/main/config/fluo.properties.deprecated -[fluo-env.sh]: ../modules/distribution/src/main/config/fluo-env.sh -[lib/ahz/pom.xml]: ../modules/distribution/src/main/lib/ahz/pom.xml diff --git a/docs/metrics.md b/docs/metrics.md deleted file mode 100644 index 8db93c7..0000000 --- a/docs/metrics.md +++ /dev/null @@ -1,130 +0,0 @@ - - -# Fluo Metrics - -A Fluo application can be configured (in [fluo-app.properties]) to report metrics. When metrics are -configured, Fluo will report some 'default' metrics about an application that help users monitor its -performance. Users can also write code to report 'application-specific' metrics from their -applications. Both 'application-specific' and 'default' metrics share the same reporter configured -by [fluo-app.properties] and are described in detail below. - -## Configuring reporters - -Fluo metrics are not published by default. To publish metrics, configure a reporter in the 'metrics' -section of [fluo-app.properties]. There are several different reporter types (i.e Console, CSV, -Graphite, JMX, SLF4J) that are implemented using [Dropwizard]. The choice of which reporter to use -depends on the visualization tool used. If you are not currently using a visualization tool, there -is [documentation][grafana] for reporting Fluo metrics to Grafana/InfluxDB. - -## Metrics names - -When Fluo metrics are reported, they are published using a naming scheme that encodes additional -information. This additional information is represented using all caps variables (i.e `METRIC`) -below. - -Default metrics start with `fluo.class` or `fluo.system` and have following naming schemes: - - fluo.class.APPLICATION.REPORTER_ID.METRIC.CLASS - fluo.system.APPLICATION.REPORTER_ID.METRIC - -Application metrics start with `fluo.app` and have following scheme: - - fluo.app.REPORTER_ID.METRIC - -The variables below describe the additional information that is encoded in metrics names. - -1. `APPLICATION` - Fluo application name -2. `REPORTER_ID` - Unique ID of the Fluo oracle, worker, or client that is reporting the metric. - When running in YARN, this ID is of the format `worker-INSTANCE_ID` or `oracle-INSTANCE_ID` - where `INSTANCE_ID` corresponds to instance number. When not running in YARN, this ID consists - of a hostname and a base36 long that is unique across all fluo processes. -3. `METRIC` - Name of the metric. For 'default' metrics, this is set by Fluo. For 'application' - metrics, this is set by user. Name should be unique and avoid using period '.' in name. -4. `CLASS` - Name of Fluo observer or loader class that produced metric. This allows things like - transaction collisions to be tracked per class. - -## Application-specific metrics - -Application metrics are implemented by retrieving a [MetricsReporter] from an [Observer], [Loader], -or [FluoClient]. These metrics are named using the format `fluo.app.REPORTER_ID.METRIC`. - -## Default metrics - -Default metrics report for a particular Observer/Loader class or system-wide. - -Below are metrics that are reported from each Observer/Loader class that is configured in a Fluo -application. These metrics are reported after each transaction and named using the format -`fluo.class.APPLICATION.REPORTER_ID.METRIC.CLASS`. - -* tx_lock_wait_time - [Timer] - - Time transaction spent waiting on locks held by other transactions. - - Only updated for transactions that have non-zero lock time. -* tx_execution_time - [Timer] - - Time transaction took to execute. - - Updated for failed and successful transactions. - - This does not include commit time, only the time from start until commit is called. -* tx_with_collision - [Meter] - - Rate of transactions with collisions. -* tx_collisions - [Meter] - - Rate of collisions. -* tx_entries_set - [Meter] - - Rate of row/columns set by transaction -* tx_entries_read - [Meter] - - Rate of row/columns read by transaction that existed. - - There is currently no count of all reads (including non-existent data) -* tx_locks_timedout - [Meter] - - Rate of timedout locks rolled back by transaction. - - These are locks that are held for very long periods by another transaction that appears to be - alive based on zookeeper. -* tx_locks_dead - [Meter] - - Rate of dead locks rolled by a transaction. - - These are locks held by a process that appears to be dead according to zookeeper. -* tx_status_`` - [Meter] - - Rate of different ways (i.e ``) a transaction can terminate - -Below are system-wide metrics that are reported for the entire Fluo application. These metrics are -named using the format `fluo.system.APPLICATION.REPORTER_ID.METRIC`. - -* oracle_response_time - [Timer] - - Time each RPC call to oracle for stamps took -* oracle_client_stamps - [Histogram] - - Number of stamps requested for each request for stamps to the server -* oracle_server_stamps - [Histogram] - - Number of stamps requested for each request for stamps from a client -* worker_notifications_queued - [Gauge] - - The current number of notifications queued for processing. -* transactor_committing - [Gauge] - - The current number of transactions that are working their way through the commit steps. - -Histograms and Timers have a counter. In the case of a histogram, the counter is the number of times -the metric was updated and not a sum of the updates. For example if a request for 5 timestamps was -made to the oracle followed by a request for 3 timestamps, then the count for `oracle_server_stamps` -would be 2 and the mean would be (5+3)/2. - -[fluo-app.properties]: ../modules/distribution/src/main/config/fluo-app.properties -[Dropwizard]: https://dropwizard.github.io/metrics/3.1.0/ -[grafana]: grafana.md -[MetricsReporter]: ../modules/api/src/main/java/org/apache/fluo/api/metrics/MetricsReporter.java -[Observer]: ../modules/api/src/main/java/org/apache/fluo/api/observer/Observer.java -[Loader]: ../modules/api/src/main/java/org/apache/fluo/api/client/Loader.java -[FluoClient]: ../modules/api/src/main/java/org/apache/fluo/api/client/FluoClient.java -[Timer]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#timers -[Counter]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#counters -[Histogram]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#histograms -[Gauge]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#gauges -[Meter]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#meters diff --git a/docs/resources/fluo-architecture.odg b/docs/resources/fluo-architecture.odg deleted file mode 100644 index fb2a9ad..0000000 Binary files a/docs/resources/fluo-architecture.odg and /dev/null differ diff --git a/docs/resources/fluo-architecture.png b/docs/resources/fluo-architecture.png deleted file mode 100644 index 3ba96fd..0000000 Binary files a/docs/resources/fluo-architecture.png and /dev/null differ diff --git a/modules/distribution/src/main/assembly/bin.xml b/modules/distribution/src/main/assembly/bin.xml index f4c53ff..aa3f16a 100644 --- a/modules/distribution/src/main/assembly/bin.xml +++ b/modules/distribution/src/main/assembly/bin.xml @@ -75,10 +75,6 @@ - ../../docs - docs - - ../../contrib/grafana contrib/grafana -- To stop receiving notification emails like this one, please contact ['"commits@fluo.apache.org" '].