flume-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hshreedha...@apache.org
Subject svn commit: r1682982 [1/2] - in /flume/site/trunk/content/sphinx: FlumeDeveloperGuide.rst FlumeUserGuide.rst download.rst index.rst releases/1.6.0.rst releases/index.rst
Date Mon, 01 Jun 2015 19:49:44 GMT
Author: hshreedharan
Date: Mon Jun  1 19:49:44 2015
New Revision: 1682982

URL: http://svn.apache.org/r1682982
Log:
FLUME-2702. Update site for Flume 1.6.0 

Added:
    flume/site/trunk/content/sphinx/releases/1.6.0.rst   (with props)
Modified:
    flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst
    flume/site/trunk/content/sphinx/FlumeUserGuide.rst
    flume/site/trunk/content/sphinx/download.rst
    flume/site/trunk/content/sphinx/index.rst
    flume/site/trunk/content/sphinx/releases/index.rst

Modified: flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst
URL: http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst?rev=1682982&r1=1682981&r2=1682982&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst (original)
+++ flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst Mon Jun  1 19:49:44 2015
@@ -15,7 +15,7 @@
 
 
 ======================================
-Flume 1.5.2 Developer Guide
+Flume 1.6.0 Developer Guide
 ======================================
 
 Introduction
@@ -277,6 +277,116 @@ properties:
 
   request-timeout = 20000              # Must be >=1000 (default: 20000)
 
+Secure RPC client - Thrift
+''''''''''''''''''''''''''
+
+As of Flume 1.6.0, Thrift source and sink supports kerberos based authentication.
+The client needs to use the getThriftInstance method of ``SecureRpcClientFactory``
+to get hold of a ``SecureThriftRpcClient``. ``SecureThriftRpcClient`` extends
+``ThriftRpcClient`` which implements the ``RpcClient`` interface. The kerberos
+authentication module resides in flume-ng-auth module which is
+required in classpath, when using the ``SecureRpcClientFactory``. Both the client
+principal and the client keytab should be passed in as parameters through the
+properties and they reflect the credentials of the client to authenticate
+against the kerberos KDC. In addition, the server principal of the destination
+Thrift source to which this client is connecting to, should also be provided.
+The following example shows how to use the ``SecureRpcClientFactory``
+within a user's data-generating application:
+
+.. code-block:: java
+
+  import org.apache.flume.Event;
+  import org.apache.flume.EventDeliveryException;
+  import org.apache.flume.event.EventBuilder;
+  import org.apache.flume.api.SecureRpcClientFactory;
+  import org.apache.flume.api.RpcClientConfigurationConstants;
+  import org.apache.flume.api.RpcClient;
+  import java.nio.charset.Charset;
+  import java.util.Properties;
+
+  public class MyApp {
+    public static void main(String[] args) {
+      MySecureRpcClientFacade client = new MySecureRpcClientFacade();
+      // Initialize client with the remote Flume agent's host, port
+      Properties props = new Properties();
+      props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");
+      props.setProperty("hosts", "h1");
+      props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(41414));
+
+      // Initialize client with the kerberos authentication related properties
+      props.setProperty("kerberos", "true");
+      props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");
+      props.setProperty("client-keytab", "/tmp/flumeclient.keytab");
+      props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");
+      client.init(props);
+
+      // Send 10 events to the remote Flume agent. That agent should be
+      // configured to listen with an AvroSource.
+      String sampleData = "Hello Flume!";
+      for (int i = 0; i < 10; i++) {
+        client.sendDataToFlume(sampleData);
+      }
+
+      client.cleanUp();
+    }
+  }
+
+  class MySecureRpcClientFacade {
+    private RpcClient client;
+    private Properties properties;
+
+    public void init(Properties properties) {
+      // Setup the RPC connection
+      this.properties = properties;
+      // Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory
+      this.client = SecureRpcClientFactory.getThriftInstance(properties);
+    }
+
+    public void sendDataToFlume(String data) {
+      // Create a Flume Event object that encapsulates the sample data
+      Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
+
+      // Send the event
+      try {
+        client.append(event);
+      } catch (EventDeliveryException e) {
+        // clean up and recreate the client
+        client.close();
+        client = null;
+        client = SecureRpcClientFactory.getThriftInstance(properties);
+      }
+    }
+
+    public void cleanUp() {
+      // Close the RPC connection
+      client.close();
+    }
+  }
+
+The remote ``ThriftSource`` should be started in kerberos mode.
+Below is an example Flume agent configuration that's waiting for a connection
+from MyApp:
+
+.. code-block:: properties
+
+  a1.channels = c1
+  a1.sources = r1
+  a1.sinks = k1
+
+  a1.channels.c1.type = memory
+
+  a1.sources.r1.channels = c1
+  a1.sources.r1.type = thrift
+  a1.sources.r1.bind = 0.0.0.0
+  a1.sources.r1.port = 41414
+  a1.sources.r1.kerberos = true
+  a1.sources.r1.agent-principal = flume/server.example.org@EXAMPLE.ORG
+  a1.sources.r1.agent-keytab = /tmp/flume.keytab
+
+
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.type = logger
+
 Failover Client
 '''''''''''''''
 
@@ -450,7 +560,7 @@ sources, sinks, and channels are allowed
 is a special embedded source and events should be send to the source
 via the put, putAll methods on the EmbeddedAgent object. Only File Channel
 and Memory Channel are allowed as channels while Avro Sink is the only
-supported sink.
+supported sink. Interceptors are also supported by the embedded agent.
 
 Note: The embedded agent has a dependency on hadoop-core.jar.
 
@@ -459,18 +569,29 @@ full Agent. The following is an exhausti
 
 Required properties are in **bold**.
 
-====================  ================  ==============================================
-Property Name         Default           Description
-====================  ================  ==============================================
-source.type           embedded          The only available source is the embedded source.
-**channel.type**      --                Either ``memory`` or ``file`` which correspond to MemoryChannel and FileChannel respectively.
-channel.*             --                Configuration options for the channel type requested, see MemoryChannel or FileChannel user guide for an exhaustive list.
-**sinks**             --                List of sink names
-**sink.type**         --                Property name must match a name in the list of sinks. Value must be ``avro``
-sink.*                --                Configuration options for the sink. See AvroSink user guide for an exhaustive list, however note AvroSink requires at least hostname and port.
-**processor.type**    --                Either ``failover`` or ``load_balance`` which correspond to FailoverSinksProcessor and LoadBalancingSinkProcessor respectively.
-processor.*           --                Configuration options for the sink processor selected. See FailoverSinksProcessor and LoadBalancingSinkProcessor user guide for an exhaustive list.
-====================  ================  ==============================================
+=====================  ================  ======================================================================
+Property Name          Default           Description
+=====================  ================  ======================================================================
+source.type            embedded          The only available source is the embedded source.
+**channel.type**       --                Either ``memory`` or ``file`` which correspond 
+		         		 to MemoryChannel and FileChannel respectively.
+channel.*              --                Configuration options for the channel type requested,
+					 see MemoryChannel or FileChannel user guide for an exhaustive list.
+**sinks**              --                List of sink names
+**sink.type**          --                Property name must match a name in the list of sinks. 
+					 Value must be ``avro``
+sink.*                 --                Configuration options for the sink. 
+					 See AvroSink user guide for an exhaustive list,
+					 however note AvroSink requires at least hostname and port.
+**processor.type**     --                Either ``failover`` or ``load_balance`` which correspond
+		            		 to FailoverSinksProcessor and LoadBalancingSinkProcessor respectively.
+processor.*            --                Configuration options for the sink processor selected.
+					 See FailoverSinksProcessor and LoadBalancingSinkProcessor 
+					 user guide for an exhaustive list.
+source.interceptors    --                Space-separated list of interceptors
+source.interceptors.*  --                Configuration options for individual interceptors 
+					 specified in the source.interceptors property
+=====================  ================  ======================================================================
 
 Below is an example of how to use the agent:
 
@@ -487,6 +608,10 @@ Below is an example of how to use the ag
     properties.put("sink2.hostname", "collector2.apache.org");
     properties.put("sink2.port",  "5565");
     properties.put("processor.type", "load_balance");
+    properties.put("source.interceptors", "i1");
+    properties.put("source.interceptors.i1.type", "static");
+    properties.put("source.interceptors.i1.key", "key1");
+    properties.put("source.interceptors.i1.value", "value1");
 
     EmbeddedAgent agent = new EmbeddedAgent("myagent");
 

Modified: flume/site/trunk/content/sphinx/FlumeUserGuide.rst
URL: http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/FlumeUserGuide.rst?rev=1682982&r1=1682981&r2=1682982&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/FlumeUserGuide.rst (original)
+++ flume/site/trunk/content/sphinx/FlumeUserGuide.rst Mon Jun  1 19:49:44 2015
@@ -15,7 +15,7 @@
 
 
 ======================================
-Flume 1.5.2 User Guide
+Flume 1.6.0 User Guide
 ======================================
 
 Introduction
@@ -234,6 +234,31 @@ The original Flume terminal will output
 
 Congratulations - you've successfully configured and deployed a Flume agent! Subsequent sections cover agent configuration in much more detail.
 
+
+Zookeeper based Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Flume supports Agent configurations via Zookeeper. *This is an experimental feature.* The configuration file needs to be uploaded
+in the Zookeeper, under a configurable prefix. The configuration file is stored in Zookeeper Node data.
+Following is how the Zookeeper Node tree would look like for agents a1 and a2
+
+.. code-block:: properties
+
+  - /flume
+   |- /a1 [Agent config file]
+   |- /a2 [Agent config file]
+
+Once the configuration file is uploaded, start the agent with following options
+
+  $ bin/flume-ng agent --conf conf -z zkhost:2181,zkhost1:2181 -p /flume --name a1 -Dflume.root.logger=INFO,console
+
+==================   ================  =========================================================================
+Argument Name        Default           Description
+==================   ================  =========================================================================
+**z**                --                Zookeeper connection string. Comma separated list of hostname:port
+**p**                /flume            Base Path in Zookeeper to store Agent configurations
+==================   ================  =========================================================================
+
 Installing third-party plugins
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -707,7 +732,7 @@ keystore-password    --                T
 keystore-type        JKS               The type of the Java keystore. This can be "JKS" or "PKCS12".
 exclude-protocols    SSLv3             Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified.
 ipFilter             false             Set this to true to enable ipFiltering for netty
-ipFilter.rules       --                Define N netty ipFilter pattern rules with this config.
+ipFilterRules        --                Define N netty ipFilter pattern rules with this config.
 ==================   ================  ===================================================
 
 Example for agent named a1:
@@ -721,15 +746,15 @@ Example for agent named a1:
   a1.sources.r1.bind = 0.0.0.0
   a1.sources.r1.port = 4141
 
-Example of ipFilter.rules
+Example of ipFilterRules
 
-ipFilter.rules defines N netty ipFilters separated by a comma a pattern rule must be in this format.
+ipFilterRules defines N netty ipFilters separated by a comma a pattern rule must be in this format.
 
 <'allow' or deny>:<'ip' or 'name' for computer name>:<pattern>
 or
 allow/deny:ip/name:pattern
 
-example: ipFilter.rules=allow:ip:127.*,allow:name:localhost,deny:ip:*
+example: ipFilterRules=allow:ip:127.*,allow:name:localhost,deny:ip:*
 
 Note that the first rule to match will apply as the example below shows from a client on the localhost
 
@@ -742,6 +767,9 @@ Thrift Source
 Listens on Thrift port and receives events from external Thrift client streams.
 When paired with the built-in ThriftSink on another (previous hop) Flume agent,
 it can create tiered collection topologies.
+Thrift source can be configured to start in secure mode by enabling kerberos authentication.
+agent-principal and agent-keytab are the properties used by the
+Thrift source to authenticate to the kerberos KDC.
 Required properties are in **bold**.
 
 ==================   ===========  ===================================================
@@ -756,6 +784,14 @@ selector.type
 selector.*
 interceptors         --           Space separated list of interceptors
 interceptors.*
+ssl                  false        Set this to true to enable SSL encryption. You must also specify a "keystore" and a "keystore-password".
+keystore             --           This is the path to a Java keystore file. Required for SSL.
+keystore-password    --           The password for the Java keystore. Required for SSL.
+keystore-type        JKS          The type of the Java keystore. This can be "JKS" or "PKCS12".
+exclude-protocols    SSLv3        Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified.
+kerberos             false        Set to true to enable kerberos authentication. In kerberos mode, agent-principal and agent-keytab  are required for successful authentication. The Thrift source in secure mode, will accept connections only from Thrift clients that have kerberos enabled and are successfully authenticated to the kerberos KDC.
+agent-principal      --           The kerberos principal used by the Thrift Source to authenticate to the kerberos KDC.
+agent-keytab         —-           The keytab location used by the Thrift Source in combination with the agent-principal to authenticate to the kerberos KDC.
 ==================   ===========  ===================================================
 
 Example for agent named a1:
@@ -793,6 +829,7 @@ restartThrottle  10000        Amount of
 restart          false        Whether the executed cmd should be restarted if it dies
 logStdErr        false        Whether the command's stderr should be logged
 batchSize        20           The max number of lines to read and send to the channel at a time
+batchTimeout     3000         Amount of time (in milliseconds) to wait, if the buffer size was not reached, before data is pushed downstream
 selector.type    replicating  replicating or multiplexing
 selector.*                    Depends on the selector.type value
 interceptors     --           Space-separated list of interceptors
@@ -841,9 +878,9 @@ invoked directly.  Common values for 'sh
 
 .. code-block:: properties
 
-  agent_foo.sources.tailsource-1.type = exec
-  agent_foo.sources.tailsource-1.shell = /bin/bash -c
-  agent_foo.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done
+  a1.sources.tailsource-1.type = exec
+  a1.sources.tailsource-1.shell = /bin/bash -c
+  a1.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done
 
 JMS Source
 ~~~~~~~~~~~
@@ -986,54 +1023,13 @@ Example for an agent named agent-1:
 
 .. code-block:: properties
 
-  agent-1.channels = ch-1
-  agent-1.sources = src-1
+  a1.channels = ch-1
+  a1.sources = src-1
 
-  agent-1.sources.src-1.type = spooldir
-  agent-1.sources.src-1.channels = ch-1
-  agent-1.sources.src-1.spoolDir = /var/log/apache/flumeSpool
-  agent-1.sources.src-1.fileHeader = true
-
-Twitter 1% firehose Source (experimental)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. warning::
-  This source is hightly experimental and may change between minor versions of Flume.
-  Use at your own risk.
-
-Experimental source that connects via Streaming API to the 1% sample twitter
-firehose, continously downloads tweets, converts them to Avro format and
-sends Avro events to a downstream Flume sink. Requires the consumer and 
-access tokens and secrets of a Twitter developer account.
-Required properties are in **bold**.
-
-====================== ===========  ===================================================
-Property Name          Default      Description
-====================== ===========  ===================================================
-**channels**           --
-**type**               --           The component type name, needs to be ``org.apache.flume.source.twitter.TwitterSource``
-**consumerKey**        --           OAuth consumer key
-**consumerSecret**     --           OAuth consumer secret
-**accessToken**        --           OAuth access token
-**accessTokenSecret**  --           OAuth toekn secret 
-maxBatchSize           1000         Maximum number of twitter messages to put in a single batch
-maxBatchDurationMillis 1000         Maximum number of milliseconds to wait before closing a batch
-====================== ===========  ===================================================
-
-Example for agent named a1:
-
-.. code-block:: properties
-
-  a1.sources = r1
-  a1.channels = c1
-  a1.sources.r1.type = org.apache.flume.source.twitter.TwitterSource
-  a1.sources.r1.channels = c1
-  a1.sources.r1.consumerKey = YOUR_TWITTER_CONSUMER_KEY
-  a1.sources.r1.consumerSecret = YOUR_TWITTER_CONSUMER_SECRET
-  a1.sources.r1.accessToken = YOUR_TWITTER_ACCESS_TOKEN
-  a1.sources.r1.accessTokenSecret = YOUR_TWITTER_ACCESS_TOKEN_SECRET
-  a1.sources.r1.maxBatchSize = 10
-  a1.sources.r1.maxBatchDurationMillis = 200
+  a1.sources.src-1.type = spooldir
+  a1.sources.src-1.channels = ch-1
+  a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool
+  a1.sources.src-1.fileHeader = true
 
 Event Deserializers
 '''''''''''''''''''
@@ -1094,6 +1090,95 @@ Property Name               Default
 deserializer.maxBlobLength  100000000           The maximum number of bytes to read and buffer for a given request
 ==========================  ==================  =======================================================================
 
+Twitter 1% firehose Source (experimental)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. warning::
+  This source is hightly experimental and may change between minor versions of Flume.
+  Use at your own risk.
+
+Experimental source that connects via Streaming API to the 1% sample twitter
+firehose, continously downloads tweets, converts them to Avro format and
+sends Avro events to a downstream Flume sink. Requires the consumer and 
+access tokens and secrets of a Twitter developer account.
+Required properties are in **bold**.
+
+====================== ===========  ===================================================
+Property Name          Default      Description
+====================== ===========  ===================================================
+**channels**           --
+**type**               --           The component type name, needs to be ``org.apache.flume.source.twitter.TwitterSource``
+**consumerKey**        --           OAuth consumer key
+**consumerSecret**     --           OAuth consumer secret
+**accessToken**        --           OAuth access token
+**accessTokenSecret**  --           OAuth toekn secret 
+maxBatchSize           1000         Maximum number of twitter messages to put in a single batch
+maxBatchDurationMillis 1000         Maximum number of milliseconds to wait before closing a batch
+====================== ===========  ===================================================
+
+Example for agent named a1:
+
+.. code-block:: properties
+
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = org.apache.flume.source.twitter.TwitterSource
+  a1.sources.r1.channels = c1
+  a1.sources.r1.consumerKey = YOUR_TWITTER_CONSUMER_KEY
+  a1.sources.r1.consumerSecret = YOUR_TWITTER_CONSUMER_SECRET
+  a1.sources.r1.accessToken = YOUR_TWITTER_ACCESS_TOKEN
+  a1.sources.r1.accessTokenSecret = YOUR_TWITTER_ACCESS_TOKEN_SECRET
+  a1.sources.r1.maxBatchSize = 10
+  a1.sources.r1.maxBatchDurationMillis = 200
+
+Kafka Source
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Kafka Source is an Apache Kafka consumer that reads messages from a Kafka topic.
+If you have multiple Kafka sources running, you can configure them with the same Consumer Group
+so each will read a unique set of partitions for the topic.
+
+
+
+===============================  ===========  ===================================================
+Property Name                    Default      Description
+===============================  ===========  ===================================================
+**channels**                     --
+**type**                         --           The component type name, needs to be ``org.apache.flume.source.kafka,KafkaSource``
+**zookeeperConnect**             --           URI of ZooKeeper used by Kafka cluster
+**groupId**                      flume        Unique identified of consumer group. Setting the same id in multiple sources or agents
+                                              indicates that they are part of the same consumer group
+**topic**                        --           Kafka topic we'll read messages from. At the time, this is a single topic only.
+batchSize                        1000         Maximum number of messages written to Channel in one batch
+batchDurationMillis              1000         Maximum time (in ms) before a batch will be written to Channel
+                                              The batch will be written whenever the first of size and time will be reached.
+Other Kafka Consumer Properties  --           These properties are used to configure the Kafka Consumer. Any producer property supported
+                                              by Kafka can be used. The only requirement is to prepend the property name with the prefix ``kafka.``.
+                                              For example: kafka.consumer.timeout.ms
+                                              Check `Kafka documentation <https://kafka.apache.org/08/configuration.html#consumerconfigs>` for details
+===============================  ===========  ===================================================
+
+.. note:: The Kafka Source overrides two Kafka consumer parameters:
+          auto.commit.enable is set to "false" by the source and we commit every batch. For improved performance
+          this can be set to "true", however, this can lead to loss of data
+          consumer.timeout.ms is set to 10ms, so when we check Kafka for new data we wait at most 10ms for the data to arrive
+          setting this to a higher value can reduce CPU utilization (we'll poll Kafka in less of a tight loop), but also means
+          higher latency in writing batches to channel (since we'll wait longer for data to arrive).
+
+
+Example for agent named tier1:
+
+.. code-block:: properties
+
+    tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
+    tier1.sources.source1.channels = channel1
+    tier1.sources.source1.zookeeperConnect = localhost:2181
+    tier1.sources.source1.topic = test1
+    tier1.sources.source1.groupId = flume
+    tier1.sources.source1.kafka.consumer.timeout.ms = 100
+
+
+
 
 NetCat Source
 ~~~~~~~~~~~~~
@@ -1182,8 +1267,13 @@ Property Name    Default      Descriptio
 **host**         --           Host name or IP address to bind to
 **port**         --           Port # to bind to
 eventSize        2500         Maximum size of a single event line, in bytes
-keepFields       false        Setting this to true will preserve the Priority,
+keepFields       none         Setting this to 'all' will preserve the Priority,
                               Timestamp and Hostname in the body of the event.
+                              A spaced separated list of fields to include
+                              is allowed as well. Currently, the following
+                              fields can be included: priority, version,
+                              timestamp, hostname. The values 'true' and 'false'
+                              have been deprecated in favor of 'all' and 'none'.
 selector.type                 replicating or multiplexing
 selector.*       replicating  Depends on the selector.type value
 interceptors     --           Space-separated list of interceptors
@@ -1220,8 +1310,13 @@ Property Name         Default
 **host**              --                Host name or IP address to bind to.
 **ports**             --                Space-separated list (one or more) of ports to bind to.
 eventSize             2500              Maximum size of a single event line, in bytes.
-keepFields            false             Setting this to true will preserve the
+keepFields            none              Setting this to 'all' will preserve the
                                         Priority, Timestamp and Hostname in the body of the event.
+                                        A spaced separated list of fields to include
+                                        is allowed as well. Currently, the following
+                                        fields can be included: priority, version,
+                                        timestamp, hostname. The values 'true' and 'false'
+                                        have been deprecated in favor of 'all' and 'none'.
 portHeader            --                If specified, the port number will be stored in the header of each event using the header name specified here. This allows for interceptors and channel selectors to customize routing logic based on the incoming port.
 charset.default       UTF-8             Default character set used while parsing syslog events into strings.
 charset.port.<port>   --                Character set is configurable on a per-port basis.
@@ -1374,6 +1469,37 @@ Property Name          Default
 handler.maxBlobLength  100000000           The maximum number of bytes to read and buffer for a given request
 =====================  ==================  ============================================================================
 
+Stress Source
+~~~~~~~~~~~~~
+
+StressSource is an internal load-generating source implementation which is very useful for
+stress tests. It allows User to configure the size of Event payload, with empty headers.
+User can configure total number of events to be sent as well maximum number of Successful
+Event to be delivered.
+
+Required properties are in **bold**.
+
+===================  ===========  ===================================================
+Property Name        Default      Description
+===================  ===========  ===================================================
+**type**             --           The component type name, needs to be ``org.apache.flume.source.StressSource``
+size                 500          Payload size of each Event. Unit:**byte**
+maxTotalEvents       -1           Maximum number of Events to be sent
+maxSuccessfulEvents  -1           Maximum number of Events successfully sent
+batchSize            1            Number of Events to be sent in one batch
+===================  ===========  ===================================================
+
+Example for agent named **a1**:
+
+.. code-block:: properties
+
+  a1.sources = stresssource-1
+  a1.channels = memoryChannel-1
+  a1.sources.stresssource-1.type = org.apache.flume.source.StressSource
+  a1.sources.stresssource-1.size = 10240
+  a1.sources.stresssource-1.maxTotalEvents = 1000000
+  a1.sources.stresssource-1.channels = memoryChannel-1
+
 Legacy Sources
 ~~~~~~~~~~~~~~
 
@@ -1485,15 +1611,16 @@ Flume should use ScribeSource based on T
 For deployment of Scribe please follow the guide from Facebook.
 Required properties are in **bold**.
 
-==============  ===========  ==============================================
-Property Name   Default      Description
-==============  ===========  ==============================================
-**type**        --           The component type name, needs to be ``org.apache.flume.source.scribe.ScribeSource``
-port            1499         Port that Scribe should be connected
-workerThreads   5            Handing threads number in Thrift
+====================  ===========  ==============================================
+Property Name         Default      Description
+====================  ===========  ==============================================
+**type**              --           The component type name, needs to be ``org.apache.flume.source.scribe.ScribeSource``
+port                  1499         Port that Scribe should be connected
+maxReadBufferBytes    16384000     Thrift Default FrameBuffer Size
+workerThreads         5            Handing threads number in Thrift
 selector.type
 selector.*
-==============  ===========  ==============================================
+====================  ===========  ==============================================
 
 Example for agent named a1:
 
@@ -1537,12 +1664,14 @@ Alias      Description
 %B         locale's long month name (January, February, ...)
 %c         locale's date and time (Thu Mar 3 23:05:25 2005)
 %d         day of month (01)
+%e         day of month without padding (1)
 %D         date; same as %m/%d/%y
 %H         hour (00..23)
 %I         hour (01..12)
 %j         day of year (001..366)
 %k         hour ( 0..23)
 %m         month (01..12)
+%n         month without padding (1..12)
 %M         minute (00..59)
 %p         locale's equivalent of am or pm
 %s         seconds since 1970-01-01 00:00:00 UTC
@@ -1599,10 +1728,10 @@ hdfs.roundValue         1             Ro
 hdfs.roundUnit          second        The unit of the round down value - ``second``, ``minute`` or ``hour``.
 hdfs.timeZone           Local Time    Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles.
 hdfs.useLocalTimeStamp  false         Use the local time (instead of the timestamp from the event header) while replacing the escape sequences.
-hdfs.closeTries         0             Number of times the sink must try to close a file. If set to 1, this sink will not re-try a failed close
+hdfs.closeTries         0             Number of times the sink must try renaming a file, after initiating a close attempt. If set to 1, this sink will not re-try a failed rename
                                       (due to, for example, NameNode or DataNode failure), and may leave the file in an open state with a .tmp extension.
-                                      If set to 0, the sink will try to close the file until the file is eventually closed
-                                      (there is no limit on the number of times it would try).
+                                      If set to 0, the sink will try to rename the file until the file is eventually renamed (there is no limit on the number of times it would try).
+                                      The file may still remain open if the close call fails but the data will be intact and in this case, the file will be closed only after a Flume restart.
 hdfs.retryInterval      180           Time in seconds between consecutive attempts to close a file. Each close call costs multiple RPC round-trips to the Namenode,
                                       so setting this too low can cause a lot of load on the name node. If set to 0 or less, the sink will not
                                       attempt to close the file if the first attempt fails, and may leave the file open or with a ".tmp" extension.
@@ -1630,6 +1759,149 @@ The above configuration will round down
 timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become ``/flume/events/2012-06-12/1150/00``.
 
 
+Hive Sink
+~~~~~~~~~
+
+This sink streams events containing delimited text or JSON data directly into a Hive table or partition.
+Events are written using Hive transactions. As soon as a set of events are committed to Hive, they become
+immediately visible to Hive queries. Partitions to which flume will stream to can either be pre-created
+or, optionally, Flume can create them if they are missing. Fields from incoming event data are mapped to
+corresponding columns in the Hive table. **This sink is provided as a preview feature and not recommended
+for use in production.**
+
+======================    ============  ======================================================================
+Name                      Default       Description
+======================    ============  ======================================================================
+**channel**               --
+**type**                  --            The component type name, needs to be ``hive``
+**hive.metastore**        --            Hive metastore URI (eg thrift://a.b.com:9083 )
+**hive.database**         --            Hive database name
+**hive.table**            --            Hive table name
+hive.partition            --            Comma separate list of partition values identifying the partition to write to. May contain escape
+                                        sequences. E.g: If the table is partitioned by (continent: string, country :string, time : string)
+                                        then 'Asia,India,2014-02-26-01-21' will indicate continent=Asia,country=India,time=2014-02-26-01-21
+hive.txnsPerBatchAsk      100           Hive grants a *batch of transactions* instead of single transactions to streaming clients like Flume.
+                                        This setting configures the number of desired transactions per Transaction Batch. Data from all
+                                        transactions in a single batch end up in a single file. Flume will write a maximum of batchSize events
+                                        in each transaction in the batch. This setting in conjunction with batchSize provides control over the
+                                        size of each file. Note that eventually Hive will transparently compact these files into larger files.
+heartBeatInterval         240           (In seconds) Interval between consecutive heartbeats sent to Hive to keep unused transactions from expiring.
+                                        Set this value to 0 to disable heartbeats.
+autoCreatePartitions      true          Flume will automatically create the necessary Hive partitions to stream to
+batchSize                 15000         Max number of events written to Hive in a single Hive transaction
+maxOpenConnections        500           Allow only this number of open connections. If this number is exceeded, the least recently used connection is closed.
+callTimeout               10000         (In milliseconds) Timeout for Hive & HDFS I/O operations, such as openTxn, write, commit, abort.
+**serializer**                          Serializer is responsible for parsing out field from the event and mapping them to columns in the hive table.
+                                        Choice of serializer depends upon the format of the data in the event. Supported serializers: DELIMITED and JSON
+roundUnit                 minute        The unit of the round down value - ``second``, ``minute`` or ``hour``.
+roundValue                1             Rounded down to the highest multiple of this (in the unit configured using hive.roundUnit), less than current time
+timeZone                  Local Time    Name of the timezone that should be used for resolving the escape sequences in partition, e.g. America/Los_Angeles.
+useLocalTimeStamp         false         Use the local time (instead of the timestamp from the event header) while replacing the escape sequences.
+======================    ============  ======================================================================
+
+Following serializers are provided for Hive sink:
+
+**JSON**: Handles UTF8 encoded Json (strict syntax) events and requires no configration. Object names
+in the JSON are mapped directly to columns with the same name in the Hive table.
+Internally uses org.apache.hive.hcatalog.data.JsonSerDe but is independent of the Serde of the Hive table.
+This serializer requires HCatalog to be installed.
+
+**DELIMITED**: Handles simple delimited textual events.
+Internally uses LazySimpleSerde but is independent of the Serde of the Hive table.
+
+==========================    ============  ======================================================================
+Name                          Default       Description
+==========================    ============  ======================================================================
+serializer.delimiter          ,             (Type: string) The field delimiter in the incoming data. To use special
+                                            characters, surround them with double quotes like "\\t"
+**serializer.fieldnames**     --            The mapping from input fields to columns in hive table. Specified as a
+                                            comma separated list (no spaces) of hive table columns names, identifying
+                                            the input fields in order of their occurrence. To skip fields leave the
+                                            column name unspecified. Eg. 'time,,ip,message' indicates the 1st, 3rd
+                                            and 4th fields in input map to time, ip and message columns in the hive table.
+serializer.serdeSeparator     Ctrl-A        (Type: character) Customizes the separator used by underlying serde. There
+                                            can be a gain in efficiency if the fields in serializer.fieldnames are in
+                                            same order as table columns, the serializer.delimiter is same as the
+                                            serializer.serdeSeparator and number of fields in serializer.fieldnames
+                                            is less than or equal to number of table columns, as the fields in incoming
+                                            event body do not need to be reordered to match order of table columns.
+                                            Use single quotes for special characters like '\\t'.
+                                            Ensure input fields do not contain this character. NOTE: If serializer.delimiter
+                                            is a single character, preferably set this to the same character
+==========================    ============  ======================================================================
+
+
+The following are the escape sequences supported:
+
+=========  =================================================
+Alias      Description
+=========  =================================================
+%{host}    Substitute value of event header named "host". Arbitrary header names are supported.
+%t         Unix time in milliseconds
+%a         locale's short weekday name (Mon, Tue, ...)
+%A         locale's full weekday name (Monday, Tuesday, ...)
+%b         locale's short month name (Jan, Feb, ...)
+%B         locale's long month name (January, February, ...)
+%c         locale's date and time (Thu Mar 3 23:05:25 2005)
+%d         day of month (01)
+%D         date; same as %m/%d/%y
+%H         hour (00..23)
+%I         hour (01..12)
+%j         day of year (001..366)
+%k         hour ( 0..23)
+%m         month (01..12)
+%M         minute (00..59)
+%p         locale's equivalent of am or pm
+%s         seconds since 1970-01-01 00:00:00 UTC
+%S         second (00..59)
+%y         last two digits of year (00..99)
+%Y         year (2010)
+%z         +hhmm numeric timezone (for example, -0400)
+=========  =================================================
+
+
+.. note:: For all of the time related escape sequences, a header with the key
+          "timestamp" must exist among the headers of the event (unless ``useLocalTimeStamp`` is set to ``true``). One way to add
+          this automatically is to use the TimestampInterceptor.
+
+Example Hive table :
+
+.. code-block:: properties
+
+ create table weblogs ( id int , msg string )
+     partitioned by (continent string, country string, time string)
+     clustered by (id) into 5 buckets
+     stored as orc;
+
+Example for agent named a1:
+
+.. code-block:: properties
+
+ a1.channels = c1
+ a1.channels.c1.type = memory
+ a1.sinks = k1
+ a1.sinks.k1.type = hive
+ a1.sinks.k1.channel = c1
+ a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
+ a1.sinks.k1.hive.database = logsdb
+ a1.sinks.k1.hive.table = weblogs
+ a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
+ a1.sinks.k1.useLocalTimeStamp = false
+ a1.sinks.k1.round = true
+ a1.sinks.k1.roundValue = 10
+ a1.sinks.k1.roundUnit = minute
+ a1.sinks.k1.serializer = DELIMITED
+ a1.sinks.k1.serializer.delimiter = "\t"
+ a1.sinks.k1.serializer.serdeSeparator = '\t'
+ a1.sinks.k1.serializer.fieldnames =id,,msg
+
+
+The above configuration will round down the timestamp to the last 10th minute. For example, an event with
+timestamp header set to 11:54:34 AM, June 12, 2012 and 'country' header set to 'india' will evaluate to the
+partition (continent='asia',country='india',time='2012-06-12-11-50'. The serializer is configured to
+accept tab separated input containing three fields and to skip the second field.
+
+
 Logger Sink
 ~~~~~~~~~~~
 
@@ -1641,6 +1913,7 @@ Property Name   Default  Description
 ==============  =======  ===========================================
 **channel**     --
 **type**        --       The component type name, needs to be ``logger``
+maxBytesToLog   16       Maximum number of bytes of the Event body to log
 ==============  =======  ===========================================
 
 Example for agent named a1:
@@ -1701,6 +1974,12 @@ This sink forms one half of Flume's tier
 sent to this sink are turned into Thrift events and sent to the configured
 hostname / port pair. The events are taken from the configured Channel in
 batches of the configured batch size.
+
+Thrift sink can be configured to start in secure mode by enabling kerberos authentication.
+To communicate with a Thrift source started in secure mode, the Thrift sink should also
+operate in secure mode. client-principal and client-keytab are the properties used by the
+Thrift sink to authenticate to the kerberos KDC. The server-principal represents the
+principal of the Thrift source this sink is configured to connect to in secure mode.
 Required properties are in **bold**.
 
 ==========================   =======  ==============================================
@@ -1714,6 +1993,15 @@ batch-size                   100      nu
 connect-timeout              20000    Amount of time (ms) to allow for the first (handshake) request.
 request-timeout              20000    Amount of time (ms) to allow for requests after the first.
 connection-reset-interval    none     Amount of time (s) before the connection to the next hop is reset. This will force the Thrift Sink to reconnect to the next hop. This will allow the sink to connect to hosts behind a hardware load-balancer when news hosts are added without having to restart the agent.
+ssl                          false    Set to true to enable SSL for this ThriftSink. When configuring SSL, you can optionally set a "truststore", "truststore-password" and "truststore-type"
+truststore                   --       The path to a custom Java truststore file. Flume uses the certificate authority information in this file to determine whether the remote Thrift Source's SSL authentication credentials should be trusted. If not specified, the default Java JSSE certificate authority files (typically "jssecacerts" or "cacerts" in the Oracle JRE) will be used.
+truststore-password          --       The password for the specified truststore.
+truststore-type              JKS      The type of the Java truststore. This can be "JKS" or other supported Java truststore type.
+exclude-protocols            SSLv3    Space-separated list of SSL/TLS protocols to exclude
+kerberos                     false    Set to true to enable kerberos authentication. In kerberos mode, client-principal, client-keytab and server-principal are required for successful authentication and communication to a kerberos enabled Thrift Source.
+client-principal             —-       The kerberos principal used by the Thrift Sink to authenticate to the kerberos KDC.
+client-keytab                —-       The keytab location used by the Thrift Sink in combination with the client-principal to authenticate to the kerberos KDC.
+server-principal             --       The kerberos principal of the Thrift Source to which the Thrift Sink is configured to connect to.
 ==========================   =======  ==============================================
 
 Example for agent named a1:
@@ -2011,7 +2299,9 @@ Property Name     Default
 **type**          --                                                                       The component type name, needs to be ``org.apache.flume.sink.elasticsearch.ElasticSearchSink``
 **hostNames**     --                                                                       Comma separated list of hostname:port, if the port is not present the default port '9300' will be used
 indexName         flume                                                                    The name of the index which the date will be appended to. Example 'flume' -> 'flume-yyyy-MM-dd'
+                                                                                           Arbitrary header substitution is supported, eg. %{header} replaces with value of named event header
 indexType         logs                                                                     The type to index the document to, defaults to 'log'
+                                                                                           Arbitrary header substitution is supported, eg. %{header} replaces with value of named event header
 clusterName       elasticsearch                                                            Name of the ElasticSearch cluster to connect to
 batchSize         100                                                                      Number of events to be written per txn.
 ttl               --                                                                       TTL in days, when set will cause the expired documents to be deleted automatically,
@@ -2024,6 +2314,10 @@ serializer        org.apache.flume.sink.
 serializer.*      --                                                                       Properties to be passed to the serializer.
 ================  ======================================================================== =======================================================================================================
 
+.. note:: Header substitution is a handy to use the value of an event header to dynamically decide the indexName and indexType to use when storing the event.
+          Caution should be used in using this feature as the event submitter now has control of the indexName and indexType.
+          Furthermore, if the elasticsearch REST client is used then the event submitter has control of the URL path used.
+
 Example for agent named a1:
 
 .. code-block:: properties
@@ -2040,18 +2334,13 @@ Example for agent named a1:
   a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
   a1.sinks.k1.channel = c1
 
-Kite Dataset Sink (experimental)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. warning::
-  This source is experimental and may change between minor versions of Flume.
-  Use at your own risk.
+Kite Dataset Sink
+~~~~~~~~~~~~~~~~~
 
-Experimental sink that writes events to a `Kite Dataset <http://kitesdk.org/docs/current/kite-data/guide.html>`_.
+Experimental sink that writes events to a `Kite Dataset <http://kitesdk.org/docs/current/guide/>`_.
 This sink will deserialize the body of each incoming event and store the
-resulting record in a Kite Dataset. It determines target Dataset by opening a
-repository URI, ``kite.repo.uri``, and loading a Dataset by name,
-``kite.dataset.name``.
+resulting record in a Kite Dataset. It determines target Dataset by loading a
+dataset by URI.
 
 The only supported serialization is avro, and the record schema must be passed
 in the event headers, using either ``flume.avro.schema.literal`` with the JSON
@@ -2065,20 +2354,104 @@ Note 2: In some cases, file rolling may
 has been exceeded. However, this delay will not exceed 5 seconds. In most
 cases, the delay is neglegible.
 
-=======================  =======  ===========================================================
-Property Name            Default  Description
-=======================  =======  ===========================================================
-**channel**              --
-**type**                 --       Must be org.apache.flume.sink.kite.DatasetSink
-**kite.repo.uri**        --       URI of the repository to open
-**kite.dataset.name**    --       Name of the Dataset where records will be written
-kite.batchSize           100      Number of records to process in each batch
-kite.rollInterval        30       Maximum wait time (seconds) before data files are released
-auth.kerberosPrincipal   --       Kerberos user principal for secure authentication to HDFS
-auth.kerberosKeytab      --       Kerberos keytab location (local FS) for the principal
-auth.proxyUser           --       The effective user for HDFS actions, if different from
-                                  the kerberos principal
-=======================  =======  ===========================================================
+============================  =======  ===========================================================
+Property Name                 Default  Description
+============================  =======  ===========================================================
+**channel**                   --
+**type**                      --       Must be org.apache.flume.sink.kite.DatasetSink
+**kite.dataset.uri**          --       URI of the dataset to open
+kite.repo.uri                 --       URI of the repository to open
+                                       (deprecated; use kite.dataset.uri instead)
+kite.dataset.namespace        --       Namespace of the Dataset where records will be written
+                                       (deprecated; use kite.dataset.uri instead)
+kite.dataset.name             --       Name of the Dataset where records will be written
+                                       (deprecated; use kite.dataset.uri instead)
+kite.batchSize                100      Number of records to process in each batch
+kite.rollInterval             30       Maximum wait time (seconds) before data files are released
+kite.flushable.commitOnBatch  true     If ``true``, the Flume transaction will be commited and the
+                                       writer will be flushed on each batch of ``kite.batchSize``
+                                       records. This setting only applies to flushable datasets. When
+                                       ``true``, it's possible for temp files with commited data to be
+                                       left in the dataset directory. These files need to be recovered
+                                       by hand for the data to be visible to DatasetReaders.
+kite.syncable.syncOnBatch     true     Controls whether the sink will also sync data when committing
+                                       the transaction. This setting only applies to syncable datasets.
+                                       Syncing gaurentees that data will be written on stable storage
+                                       on the remote system while flushing only gaurentees that data
+                                       has left Flume's client buffers. When the
+                                       ``kite.flushable.commitOnBatch`` property is set to ``false``,
+                                       this property must also be set to ``false``.
+kite.entityParser             avro     Parser that turns Flume ``Events`` into Kite entities.
+                                       Valid values are ``avro`` and the fully-qualified class name
+                                       of an implementation of the ``EntityParser.Builder`` interface.
+kite.failurePolicy            retry    Policy that handles non-recoverable errors such as a missing
+                                       ``Schema`` in the ``Event`` header. The default value, ``retry``,
+                                       will fail the current batch and try again which matches the old
+                                       behavior. Other valid values are ``save``, which will write the
+                                       raw ``Event`` to the ``kite.error.dataset.uri`` dataset, and the
+                                       fully-qualified class name of an implementation of the
+                                       ``FailurePolicy.Builder`` interface.
+kite.error.dataset.uri        --       URI of the dataset where failed events are saved when
+                                       ``kite.failurePolicy`` is set to ``save``. **Required** when
+                                       the ``kite.failurePolicy`` is set to ``save``.
+auth.kerberosPrincipal        --       Kerberos user principal for secure authentication to HDFS
+auth.kerberosKeytab           --       Kerberos keytab location (local FS) for the principal
+auth.proxyUser                --       The effective user for HDFS actions, if different from
+                                       the kerberos principal
+============================  =======  ===========================================================
+
+
+Kafka Sink
+~~~~~~~~~~
+This is a Flume Sink implementation that can publish data to a
+`Kafka <http://kafka.apache.org/>`_ topic. One of the objective is to integrate Flume
+with Kafka so that pull based processing systems can process the data coming
+through various Flume sources. This currently supports Kafka 0.8.x series of releases.
+
+Required properties are marked in bold font.
+
+
+===============================  ===================  =============================================================================================
+Property Name                    Default              Description
+===============================  ===================  =============================================================================================
+**type**                         --                   Must be set to ``org.apache.flume.sink.kafka.KafkaSink``
+**brokerList**                   --                   List of brokers Kafka-Sink will connect to, to get the list of topic partitions
+                                                      This can be a partial list of brokers, but we recommend at least two for HA.
+                                                      The format is comma separated list of hostname:port
+topic                            default-flume-topic  The topic in Kafka to which the messages will be published. If this parameter is configured,
+                                                      messages will be published to this topic.
+                                                      If the event header contains a "topic" field, the event will be published to that topic
+                                                      overriding the topic configured here.
+batchSize                        100                  How many messages to process in one batch. Larger batches improve throughput while adding latency.
+requiredAcks                     1                    How many replicas must acknowledge a message before its considered successfully written.
+                                                      Accepted values are 0 (Never wait for acknowledgement), 1 (wait for leader only), -1 (wait for all replicas)
+                                                      Set this to -1 to avoid data loss in some cases of leader failure.
+Other Kafka Producer Properties  --                   These properties are used to configure the Kafka Producer. Any producer property supported
+                                                      by Kafka can be used. The only requirement is to prepend the property name with the prefix ``kafka.``.
+                                                      For example: kafka.producer.type
+===============================  ===================  =============================================================================================
+
+.. note::   Kafka Sink uses the ``topic`` and ``key`` properties from the FlumeEvent headers to send events to Kafka.
+            If ``topic`` exists in the headers, the event will be sent to that specific topic, overriding the topic configured for the Sink.
+            If ``key`` exists in the headers, the key will used by Kafka to partition the data between the topic partitions. Events with same key
+            will be sent to the same partition. If the key is null, events will be sent to random partitions.
+
+An example configuration of a Kafka sink is given below. Properties starting
+with the prefix ``kafka`` (the last 3 properties) are used when instantiating
+the Kafka producer. The properties that are passed when creating the Kafka
+producer are not limited to the properties given in this example.
+Also it's possible include your custom properties here and access them inside
+the preprocessor through the Flume Context object passed in as a method
+argument.
+
+.. code-block:: properties
+
+    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
+    a1.sinks.k1.topic = mytopic
+    a1.sinks.k1.brokerList = localhost:9092
+    a1.sinks.k1.requiredAcks = 1
+    a1.sinks.k1.batchSize = 20
+    a1.sinks.k1.channel = c1
 
 Custom Sink
 ~~~~~~~~~~~
@@ -2189,6 +2562,60 @@ Example for agent named a1:
   a1.channels = c1
   a1.channels.c1.type = jdbc
 
+Kafka Channel
+~~~~~~~~~~~~~
+
+The events are stored in a Kafka cluster (must be installed separately). Kafka provides high availability and
+replication, so in case an agent or a kafka broker crashes, the events are immediately available to other sinks
+
+The Kafka channel can be used for multiple scenarios:
+* With Flume source and sink - it provides a reliable and highly available channel for events
+* With Flume source and interceptor but no sink - it allows writing Flume events into a Kafka topic, for use by other apps
+* With Flume sink, but no source - it is a low-latency, fault tolerant way to send events from Kafka to Flume sources such as HDFS, HBase or Solr
+
+Required properties are in **bold**.
+
+======================  ==========================  ===============================================================================================================
+Property Name           Default                           Description
+======================  ==========================  ===============================================================================================================
+**type**                --                          The component type name, needs to be ``org.apache.flume.channel.kafka.KafkaChannel``
+**brokerList**          --                          List of brokers in the Kafka cluster used by the channel
+                                                    This can be a partial list of brokers, but we recommend at least two for HA.
+                                                    The format is comma separated list of hostname:port
+**zookeeperConnect**    --                          URI of ZooKeeper used by Kafka cluster
+                                                    The format is comma separated list of hostname:port. If chroot is used, it is added once at the end.
+                                                    For example: zookeeper-1:2181,zookeeper-2:2182,zookeeper-3:2181/kafka
+topic                   flume-channel               Kafka topic which the channel will use
+groupId                 flume                       Consumer group ID the channel uses to register with Kafka.
+                                                    Multiple channels must use the same topic and group to ensure that when one agent fails another can get the data
+                                                    Note that having non-channel consumers with the same ID can lead to data loss.
+parseAsFlumeEvent       true                        Expecting Avro datums with FlumeEvent schema in the channel.
+                                                    This should be true if Flume source is writing to the channel
+                                                    And false if other producers are writing into the topic that the channel is using
+                                                    Flume source messages to Kafka can be parsed outside of Flume by using
+                                                    org.apache.flume.source.avro.AvroFlumeEvent provided by the flume-ng-sdk artifact
+readSmallestOffset      false                       When set to true, the channel will read all data in the topic, starting from the oldest event
+                                                    when false, it will read only events written after the channel started
+                                                    When "parseAsFlumeEvent" is true, this will be false. Flume source will start prior to the sinks and this
+                                                    guarantees that events sent by source before sinks start will not be lost.
+Other Kafka Properties  --                          These properties are used to configure the Kafka Producer and Consumer used by the channel.
+                                                    Any property supported by Kafka can be used.
+                                                    The only requirement is to prepend the property name with the prefix ``kafka.``.
+                                                    For example: kafka.producer.type
+======================  ==========================  ===============================================================================================================
+
+.. note:: Due to the way the channel is load balanced, there may be duplicate events when the agent first starts up
+
+Example for agent named a1:
+
+.. code-block:: properties
+
+    a1.channels.channel1.type   = org.apache.flume.channel.kafka.KafkaChannel
+    a1.channels.channel1.capacity = 10000
+    a1.channels.channel1.transactionCapacity = 1000
+    a1.channels.channel1.brokerList=kafka-2:9092,kafka-3:9092
+    a1.channels.channel1.topic=channel1
+    a1.channels.channel1.zookeeperConnect=kafka-1:2181
 
 File Channel
 ~~~~~~~~~~~~
@@ -2211,6 +2638,7 @@ capacity
 keep-alive                                        3                                 Amount of time (in sec) to wait for a put operation
 use-log-replay-v1                                 false                             Expert: Use old replay logic
 use-fast-replay                                   false                             Expert: Replay without using queue
+checkpointOnClose                                 true                              Controls if a checkpoint is created when the channel is closed. Creating a checkpoint on close speeds up subsequent startup of the file channel by avoiding replay.
 encryption.activeKey                              --                                Key name used to encrypt new data
 encryption.cipherProvider                         --                                Cipher provider type, supported types: AESCTRNOPADDING
 encryption.keyProvider                            --                                Key provider type, supported types: JCEKSFILE
@@ -2533,7 +2961,12 @@ that so long as one is available events
 The failover mechanism works by relegating failed sinks to a pool where
 they are assigned a cool down period, increasing with sequential failures
 before they are retried. Once a sink successfully sends an event, it is
-restored to the live pool.
+restored to the live pool. The Sinks have a priority associated with them,
+larger the number, higher the priority. If a Sink fails while sending a Event
+the next Sink with highest priority shall be tried next for sending Events.
+For example, a sink with priority 100 is activated before the Sink with priority
+80. If no priority is specified, thr priority is determined based on the order in which
+the Sinks are specified in configuration.
 
 To configure, set a sink groups processor to ``failover`` and set
 priorities for all individual sinks. All specified priorities must
@@ -2547,8 +2980,9 @@ Property Name                      Defau
 =================================  ===========  ===================================================================================
 **sinks**                          --           Space-separated list of sinks that are participating in the group
 **processor.type**                 ``default``  The component type name, needs to be ``failover``
-**processor.priority.<sinkName>**  --             <sinkName> must be one of the sink instances associated with the current sink group
-processor.maxpenalty               30000        (in millis)
+**processor.priority.<sinkName>**  --           Priority value.  <sinkName> must be one of the sink instances associated with the current sink group
+                                                A higher priority value Sink gets activated earlier. A larger absolute value indicates higher priority
+processor.maxpenalty               30000        The maximum backoff period for the failed Sink (in millis)
 =================================  ===========  ===================================================================================
 
 Example for agent named a1:
@@ -2836,6 +3270,45 @@ Sample flume.conf file:
   a1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineFile = /etc/flume-ng/conf/morphline.conf
   a1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineId = morphline1
 
+Search and Replace Interceptor
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This interceptor provides simple string-based search-and-replace functionality
+based on Java regular expressions. Backtracking / group capture is also available.
+This interceptor uses the same rules as in the Java Matcher.replaceAll() method.
+
+================  =======  ========================================================================
+Property Name     Default  Description
+================  =======  ========================================================================
+**type**          --       The component type name has to be ``search_replace``
+searchPattern     --       The pattern to search for and replace.
+replaceString     --       The replacement string.
+charset           UTF-8    The charset of the event body. Assumed by default to be UTF-8.
+================  =======  ========================================================================
+
+Example configuration:
+
+.. code-block:: properties
+
+  a1.sources.avroSrc.interceptors = search-replace
+  a1.sources.avroSrc.interceptors.search-replace.type = search_replace
+
+  # Remove leading alphanumeric characters in an event body.
+  a1.sources.avroSrc.interceptors.search-replace.searchPattern = ^[A-Za-z0-9_]+
+  a1.sources.avroSrc.interceptors.search-replace.replaceString =
+
+Another example:
+
+.. code-block:: properties
+
+  a1.sources.avroSrc.interceptors = search-replace
+  a1.sources.avroSrc.interceptors.search-replace.type = search_replace
+
+  # Use grouping operators to reorder and munge words on a line.
+  a1.sources.avroSrc.interceptors.search-replace.searchPattern = The quick brown ([a-z]+) jumped over the lazy ([a-z]+)
+  a1.sources.avroSrc.interceptors.search-replace.replaceString = The hungry $2 ate the careless $1
+
+
 Regex Filtering Interceptor
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2947,7 +3420,7 @@ Log4J Appender
 
 Appends Log4j events to a flume agent's avro source. A client using this
 appender must have the flume-ng-sdk in the classpath (eg,
-flume-ng-sdk-1.5.2.jar).
+flume-ng-sdk-1.6.0.jar).
 Required properties are in **bold**.
 
 =====================  =======  ==================================================================================
@@ -3011,7 +3484,7 @@ Load Balancing Log4J Appender
 
 Appends Log4j events to a list of flume agent's avro source. A client using this
 appender must have the flume-ng-sdk in the classpath (eg,
-flume-ng-sdk-1.5.2.jar). This appender supports a round-robin and random
+flume-ng-sdk-1.6.0.jar). This appender supports a round-robin and random
 scheme for performing the load balancing. It also supports a configurable backoff
 timeout so that down agents are removed temporarily from the set of hosts
 Required properties are in **bold**.
@@ -3076,9 +3549,14 @@ Sample log4j.properties file configured
 Security
 ========
 
-The HDFS sink supports Kerberos authentication if the underlying HDFS is
-running in secure mode. Please refer to the HDFS Sink section for
-configuring the HDFS sink Kerberos-related options.
+The HDFS sink, HBase sink, Thrift source, Thrift sink and Kite Dataset sink all support
+Kerberos authentication. Please refer to the corresponding sections for
+configuring the Kerberos-related options.
+
+Flume agent will authenticate to the kerberos KDC as a single principal, which will be
+used by different components that require kerberos authentication. The principal and
+keytab configured for Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink
+should be the same, otherwise the component will fail to start.
 
 Monitoring
 ==========
@@ -3087,6 +3565,16 @@ Monitoring in Flume is still a work in p
 Several Flume components report metrics to the JMX platform MBean server. These
 metrics can be queried using Jconsole.
 
+JMX Reporting
+-------------
+
+JMX Reporting can be enabled by specifying JMX parameters in the JAVA_OPTS environment variable using
+flume-env.sh, like
+
+  export JAVA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5445 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
+
+NOTE: The sample above disables the security. To enable Security, please refer http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html
+
 Ganglia Reporting
 -----------------
 Flume can also report these metrics to
@@ -3237,6 +3725,32 @@ metrics as long values.
 
   }
 
+Tools
+=====
+
+File Channel Integrity Tool
+---------------------------
+
+File Channel Integrity tool verifies the integrity of individual Events in the File channel
+and removes corrupted Events.
+
+The tools can be run as follows::
+
+  $bin/flume-ng tool --conf ./conf FCINTEGRITYTOOL -l ./datadir
+
+where datadir the comma separated list of data directory to ve verified.
+
+Following are the options available
+
+=======================  ====================================================================
+Option Name              Description
+=======================  ====================================================================
+h/help                   Displays help
+**l/dataDirs**           Comma-separated list of data directories which the tool must verify
+=======================  ====================================================================
+
+
+
 
 Topology Design Considerations
 ==============================

Modified: flume/site/trunk/content/sphinx/download.rst
URL: http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/download.rst?rev=1682982&r1=1682981&r2=1682982&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/download.rst (original)
+++ flume/site/trunk/content/sphinx/download.rst Mon Jun  1 19:49:44 2015
@@ -10,8 +10,8 @@ originals on the main distribution serve
 
 .. csv-table::
 
-   "Apache Flume binary (tar.gz)",  `apache-flume-1.5.2-bin.tar.gz <http://www.apache.org/dyn/closer.cgi/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz>`_, `apache-flume-1.5.2-bin.tar.gz.md5 <http://www.us.apache.org/dist/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz.md5>`_, `apache-flume-1.5.2-bin.tar.gz.asc <http://www.us.apache.org/dist/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz.asc>`_
-  "Apache Flume source (tar.gz)",  `apache-flume-1.5.2-src.tar.gz <http://www.apache.org/dyn/closer.cgi/flume/1.5.2/apache-flume-1.5.2-src.tar.gz>`_, `apache-flume-1.5.2-src.tar.gz.md5 <http://www.us.apache.org/dist/flume/1.5.2/apache-flume-1.5.2-src.tar.gz.md5>`_, `apache-flume-1.5.2-src.tar.gz.asc <http://www.us.apache.org/dist/flume/1.5.2/apache-flume-1.5.2-src.tar.gz.asc>`_
+   "Apache Flume binary (tar.gz)",  `apache-flume-1.6.0-bin.tar.gz <http://www.apache.org/dyn/closer.cgi/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz>`_, `apache-flume-1.6.0-bin.tar.gz.md5 <http://www.us.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz.md5>`_, `apache-flume-1.6.0-bin.tar.gz.asc <http://www.us.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz.asc>`_
+  "Apache Flume source (tar.gz)",  `apache-flume-1.6.0-src.tar.gz <http://www.apache.org/dyn/closer.cgi/flume/1.6.0/apache-flume-1.6.0-src.tar.gz>`_, `apache-flume-1.6.0-src.tar.gz.md5 <http://www.us.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-src.tar.gz.md5>`_, `apache-flume-1.6.0-src.tar.gz.asc <http://www.us.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-src.tar.gz.asc>`_
 
 It is essential that you verify the integrity of the downloaded files using the PGP or MD5 signatures. Please read
 `Verifying Apache HTTP Server Releases <http://httpd.apache.org/dev/verification.html>`_ for more information on
@@ -23,9 +23,9 @@ as well as the asc signature file for th
 Then verify the signatures using::
 
     % gpg --import KEYS
-    % gpg --verify apache-flume-1.5.2-src.tar.gz.asc
+    % gpg --verify apache-flume-1.6.0-src.tar.gz.asc
 
-Apache Flume 1.5.2 is signed by Hari Shreedharan 77FFC9AB
+Apache Flume 1.6.0 is signed by Johny Rufus 2C79120F
 
 Alternatively, you can verify the MD5 or SHA1 signatures of the files. A program called md5, md5sum, or shasum is included in many
 Unix distributions for this purpose.

Modified: flume/site/trunk/content/sphinx/index.rst
URL: http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/index.rst?rev=1682982&r1=1682981&r2=1682982&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/index.rst (original)
+++ flume/site/trunk/content/sphinx/index.rst Mon Jun  1 19:49:44 2015
@@ -33,6 +33,38 @@ application.
 
 .. raw:: html
 
+   <h3>May 20, 2015 - Apache Flume 1.6.0 Released</h3>
+
+The Apache Flume team is pleased to announce the release of Flume 1.6.0.
+
+Flume is a distributed, reliable, and available service for efficiently
+collecting, aggregating, and moving large amounts of streaming event data.
+
+Version 1.6.0 is the ninth Flume release as an Apache top-level project.
+Flume 1.6.0 is stable, production-ready software, and is backwards-compatible
+with previous versions of the Flume 1.x codeline.
+
+Several months of active development went into this release: 105 patches were committed since 1.5.2, representing many features, enhancements, and bug fixes. While the full change log can be found on the 1.6.0 release page (link below), here are a few new feature highlights:
+
+    * Flume Sink and Source for Apache Kafka
+    * A new channel that uses Kafka
+    * Hive Sink based on the new Hive Streaming support
+    * End to End authentication in Flume
+    * Simple regex search-and-replace interceptor
+
+The full change log and documentation are available on the
+`Flume 1.6.0 release page <releases/1.6.0.html>`__.
+
+This release can be downloaded from the Flume `Download <download.html>`__ page.
+
+Your contributions, feedback, help and support make Flume better!
+For more information on how to report problems or contribute,
+please visit our `Get Involved <getinvolved.html>`__ page.
+
+The Apache Flume Team
+
+.. raw:: html
+
    <h3>November 18, 2014 - Apache Flume 1.5.2 Released</h3>
 
 The Apache Flume team is pleased to announce the release of Flume 1.5.2



Mime
View raw message