flume-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r840975 [4/5] - in /websites/staging/flume/trunk/content: ./ .doctrees/ _sources/ releases/
Date Thu, 06 Dec 2012 19:05:40 GMT
Modified: websites/staging/flume/trunk/content/_sources/FlumeUserGuide.txt
==============================================================================
--- websites/staging/flume/trunk/content/_sources/FlumeUserGuide.txt (original)
+++ websites/staging/flume/trunk/content/_sources/FlumeUserGuide.txt Thu Dec  6 19:05:38 2012
@@ -15,7 +15,7 @@
 
 
 ======================================
-Flume 1.3.0-SNAPSHOT User Guide
+Flume 1.3.0 User Guide
 ======================================
 
 Introduction
@@ -60,8 +60,8 @@ recognized by the target Flume source. F
 used to receive Avro events from Avro clients or other Flume agents in the flow
 that send events from an Avro sink. When a Flume source receives an event, it
 stores it into one or more channels. The channel is a passive store that keeps
-the event until it's consumed by a Flume sink. The JDBC channel is one example
--- it uses a filesystem backed embedded database. The sink removes the event
+the event until it's consumed by a Flume sink. The file channel is one example
+-- it is backed by the local filesystem. The sink removes the event
 from the channel and puts it into an external repository like HDFS (via Flume
 HDFS sink) or forwards it to the Flume source of the next Flume agent (next
 hop) in the flow. The source and sink within the given agent run asynchronously
@@ -97,7 +97,7 @@ Recoverability
 ~~~~~~~~~~~~~~
 
 The events are staged in the channel, which manages recovery from failure.
-Flume supports a durable JDBC channel which is backed by a relational database.
+Flume supports a durable file channel which is backed by the local file system.
 There's also a memory channel which simply stores the events in an in-memory
 queue, which is faster but any events still left in the memory channel when an
 agent process dies can't be recovered.
@@ -132,10 +132,10 @@ Wiring the pieces together
 The agent needs to know what individual components to load and how they are
 connected in order to constitute the flow. This is done by listing the names of
 each of the sources, sinks and channels in the agent, and then specifying the
-connecting channel for each sink and source. For example, a agent flows events
-from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a JDBC
-channel called jdbc-channel. The configuration file will contain names of these
-components and jdbc-channel as a shared channel for both avroWeb source and
+connecting channel for each sink and source. For example, an agent flows events
+from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a file
+channel called file-channel. The configuration file will contain names of these
+components and file-channel as a shared channel for both avroWeb source and
 hdfs-cluster1 sink.
 
 Starting an agent
@@ -152,41 +152,47 @@ properties file.
 
 A simple example
 ~~~~~~~~~~~~~~~~
-Here, we give an example configuration file, describing a single-node Flume deployment. This configuration lets a user generate events and subsequently logs them to the console.
+Here, we give an example configuration file, describing a single-node Flume deployment.
+This configuration lets a user generate events and subsequently logs them to the console.
 
 .. code-block:: properties
 
   # example.conf: A single-node Flume configuration
 
   # Name the components on this agent
-  agent1.sources = source1
-  agent1.sinks = sink1
-  agent1.channels = channel1
-
-  # Describe/configure source1
-  agent1.sources.source1.type = netcat
-  agent1.sources.source1.bind = localhost
-  agent1.sources.source1.port = 44444
+  a1.sources = r1
+  a1.sinks = k1
+  a1.channels = c1
+
+  # Describe/configure the source
+  a1.sources.r1.type = netcat
+  a1.sources.r1.bind = localhost
+  a1.sources.r1.port = 44444
 
-  # Describe sink1
-  agent1.sinks.sink1.type = logger
+  # Describe the sink
+  a1.sinks.k1.type = logger
 
   # Use a channel which buffers events in memory
-  agent1.channels.channel1.type = memory
-  agent1.channels.channel1.capacity = 1000
-  agent1.channels.channel1.transactionCapactiy = 100
+  a1.channels.c1.type = memory
+  a1.channels.c1.capacity = 1000
+  a1.channels.c1.transactionCapacity = 100
 
   # Bind the source and sink to the channel
-  agent1.sources.source1.channels = channel1
-  agent1.sinks.sink1.channel = channel1
+  a1.sources.r1.channels = c1
+  a1.sinks.k1.channel = c1
 
-This configuration defines a single agent, called *agent1*. *agent1* has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the various components, then describes their types and configuration parameters. A given configuration file might define several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.
+This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel
+that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the
+various components, then describes their types and configuration parameters. A given configuration file might define 
+several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.
 
 Given this configuration file, we can start Flume as follows::
 
-  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.root.logger=INFO,console
+  $ bin/flume-ng agent --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
 
-Note that in a full deployment we would typically include one more option: ``--conf=<conf-dir>``. The ``<conf-dir>`` directory would include a shell script *flume-env.sh* and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.
+Note that in a full deployment we would typically include one more option: ``--conf=<conf-dir>``.
+The ``<conf-dir>`` directory would include a shell script *flume-env.sh* and potentially a log4j properties file.
+In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.
 
 From a separate terminal, we can then telnet port 44444 and send Flume an event:
 
@@ -417,15 +423,15 @@ config to do that:
   # list the sources, sinks and channels in the agent
   agent_foo.sources = avro-AppSrv-source1 exec-tail-source2
   agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2
-  agent_foo.channels = mem-channel-1 jdbc-channel-2
+  agent_foo.channels = mem-channel-1 file-channel-2
 
   # flow #1 configuration
   agent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1
   agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1
 
   # flow #2 configuration
-  agent_foo.sources.exec-tail-source2.channels = jdbc-channel-2
-  agent_foo.sinks.avro-forward-sink2.channel = jdbc-channel-2
+  agent_foo.sources.exec-tail-source2.channels = file-channel-2
+  agent_foo.sinks.avro-forward-sink2.channel = file-channel-2
 
 Configuring a multi agent flow
 ------------------------------
@@ -444,11 +450,11 @@ Weblog agent config:
   # list sources, sinks and channels in the agent
   agent_foo.sources = avro-AppSrv-source
   agent_foo.sinks = avro-forward-sink
-  agent_foo.channels = jdbc-channel
+  agent_foo.channels = file-channel
 
   # define the flow
-  agent_foo.sources.avro-AppSrv-source.channels = jdbc-channel
-  agent_foo.sinks.avro-forward-sink.channel = jdbc-channel
+  agent_foo.sources.avro-AppSrv-source.channels = file-channel
+  agent_foo.sinks.avro-forward-sink.channel = file-channel
 
   # avro sink properties
   agent_foo.sources.avro-forward-sink.type = avro
@@ -545,28 +551,64 @@ agent named agent_foo has a single avro 
   # list the sources, sinks and channels in the agent
   agent_foo.sources = avro-AppSrv-source1
   agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2
-  agent_foo.channels = mem-channel-1 jdbc-channel-2
+  agent_foo.channels = mem-channel-1 file-channel-2
 
   # set channels for source
-  agent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1 jdbc-channel-2
+  agent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1 file-channel-2
 
   # set channel for sinks
   agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1
-  agent_foo.sinks.avro-forward-sink2.channel = jdbc-channel-2
+  agent_foo.sinks.avro-forward-sink2.channel = file-channel-2
 
   # channel selector configuration
   agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing
   agent_foo.sources.avro-AppSrv-source1.selector.header = State
   agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1
-  agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = jdbc-channel-2
-  agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 jdbc-channel-2
+  agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
+  agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2
   agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1
 
 The selector checks for a header called "State". If the value is "CA" then its
-sent to mem-channel-1, if its "AZ" then it goes to jdbc-channel-2 or if its
+sent to mem-channel-1, if its "AZ" then it goes to file-channel-2 or if its
 "NY" then both. If the "State" header is not set or doesn't match any of the
 three, then it goes to mem-channel-1 which is designated as 'default'.
 
+The selector also supports optional channels. To specify optional channels for
+a header, the config parameter 'optional' is used in the following way:
+
+.. code-block:: properties
+
+  # channel selector configuration
+  agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing
+  agent_foo.sources.avro-AppSrv-source1.selector.header = State
+  agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1
+  agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
+  agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2
+  agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2
+  agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
+  agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1
+
+The selector will attempt to write to the required channels first and will fail
+the transaction if even one of these channels fails to consume the events. The
+transaction is reattempted on **all** of the channels. Once all required
+channels have consumed the events, then the selector will attempt to write to
+the optional channels. A failure by any of the optional channels to consume the
+event is simply ignored and not retried.
+
+If there is an overlap between the optional channels and required channels for a
+specific header, the channel is considered to be required, and a failure in the
+channel will cause the entire set of required channels to be retried. For
+instance, in the above example, for the header "CA" mem-channel-1 is considered
+to be a required channel even though it is marked both as required and optional,
+and a failure to write to this channel will cause that
+event to be retried on **all** channels configured for the selector.
+
+Note that if a header does not have any required channels, then the event will
+be written to the default channels and will be attempted to be written to the
+optional channels for that header. Specifying optional channels will still cause
+the event to be written to the default channels, if no required channels are
+specified.
+
 
 Flume Sources
 -------------
@@ -587,20 +629,22 @@ Property Name   Default      Description
 **bind**        --           hostname or IP address to listen on
 **port**        --           Port # to bind to
 threads         --           Maximum number of worker threads to spawn
+selector.type
+selector.*
 interceptors    --           Space separated list of interceptors
 interceptors.*
 ==============  ===========  ===================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = avrosource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.avrosource-1.type = avro
-  agent_foo.sources.avrosource-1.channels = memoryChannel-1
-  agent_foo.sources.avrosource-1.bind = 0.0.0.0
-  agent_foo.sources.avrosource-1.port = 4141
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = avro
+  a1.sources.r1.channels = c1
+  a1.sources.r1.bind = 0.0.0.0
+  a1.sources.r1.port = 4141
 
 Exec Source
 ~~~~~~~~~~~
@@ -646,21 +690,74 @@ interceptors.*
              never guarantee data has been received when using a unidirectional
              asynchronous interface such as ExecSource! As an extension of this
              warning - and to be completely clear - there is absolutely zero guarantee
-             of event delivery when using this source. You have been warned.
+             of event delivery when using this source. For stronger reliability
+             guarantees, consider the Spooling Directory Source or direct integration
+             with Flume via the SDK.
 
 .. note:: You can use ExecSource to emulate TailSource from Flume 0.9x (flume og).
           Just use unix command ``tail -F /full/path/to/your/file``. Parameter
           -F is better in this case than -f as it will also follow file rotation.
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
+
+.. code-block:: properties
+
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = exec
+  a1.sources.r1.command = tail -F /var/log/secure
+  a1.sources.r1.channels = c1
+
+Spooling Directory Source
+~~~~~~~~~~~~~~~~~~~~~~~~~
+This source lets you ingest data by dropping files in a spooling directory on
+disk. **Unlike other asynchronous sources, this source
+avoids data loss even if Flume is restarted or fails.**
+Flume will watch the directory for new files and read then ingest them
+as they appear. After a given file has been fully read into the channel,
+it is renamed to indicate completion. This allows a cleaner process to remove
+completed files periodically. Note, however,
+that events may be duplicated if failures occur, consistent with the semantics
+offered by other Flume components. The channel optionally inserts the full path of
+the origin file into a header field of each event. This source buffers file data
+in memory during reads; be sure to set the `bufferMaxLineLength` option to a number
+greater than the longest line you expect to see in your input data.
+
+.. warning:: This channel expects that only immutable, uniquely named files
+             are dropped in the spooling directory. If duplicate names are
+             used, or files are modified while being read, the source will
+             fail with an error message. For some use cases this may require
+             adding unique identifiers (such as a timestamp) to log file names
+             when they are copied into the spooling directory.
+
+====================  ==============  ==========================================================
+Property Name         Default         Description
+====================  ==============  ==========================================================
+**channels**          --
+**type**              --              The component type name, needs to be ``spooldir``
+**spoolDir**          --              The directory where log files will be spooled
+fileSuffix            .COMPLETED      Suffix to append to completely ingested files
+fileHeader            false           Whether to add a header storing the filename
+fileHeaderKey         file            Header key to use when appending filename to header
+batchSize             10              Granularity at which to batch transfer to the channel
+bufferMaxLines        100             Maximum number of lines the commit buffer can hold
+bufferMaxLineLength   5000            Maximum length of a line in the commit buffer
+selector.type         replicating     replicating or multiplexing
+selector.*                            Depends on the selector.type value
+interceptors          --              Space separated list of interceptors
+interceptors.*
+====================  ==============  ==========================================================
+
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = tailsource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.tailsource-1.type = exec
-  agent_foo.sources.tailsource-1.command = tail -F /var/log/secure
-  agent_foo.sources.tailsource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = spooldir
+  a1.sources.r1.spoolDir = /var/log/apache/flumeSpool
+  a1.sources.r1.fileHeader = true
+  a1.sources.r1.channels = c1
 
 NetCat Source
 ~~~~~~~~~~~~~
@@ -681,22 +778,23 @@ Property Name    Default      Descriptio
 **bind**         --           Host name or IP address to bind to
 **port**         --           Port # to bind to
 max-line-length  512          Max line length per event body (in bytes)
+ack-every-event  true         Respond with an "OK" for every event received
 selector.type    replicating  replicating or multiplexing
 selector.*                    Depends on the selector.type value
 interceptors     --           Space separated list of interceptors
 interceptors.*
 ===============  ===========  ===========================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = ncsource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.ncsource-1.type = netcat
-  agent_foo.sources.ncsource-1.bind = 0.0.0.0
-  agent_foo.sources.ncsource-1.bind = 6666
-  agent_foo.sources.ncsource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = netcat
+  a1.sources.r1.bind = 0.0.0.0
+  a1.sources.r1.bind = 6666
+  a1.sources.r1.channels = c1
 
 Sequence Generator Source
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -714,29 +812,32 @@ selector.type                replicating
 selector.*      replicating  Depends on the selector.type value
 interceptors    --           Space separated list of interceptors
 interceptors.*
+batchSize       1
 ==============  ===========  ========================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = ncsource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.ncsource-1.type = seq
-  agent_foo.sources.ncsource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = seq
+  a1.sources.r1.channels = c1
 
 Syslog Sources
 ~~~~~~~~~~~~~~
 
 Reads syslog data and generate Flume events. The UDP source treats an entire
-message as a single event. The TCP source on creates a new event for a string
-of characters separated by carriage return ('\n').
+message as a single event. The TCP sources create a new event for each string
+of characters separated by a newline ('\n').
 
 Required properties are in **bold**.
 
 Syslog TCP Source
 '''''''''''''''''
 
+The original, tried-and-true syslog TCP source.
+
 ==============   ===========  ==============================================
 Property Name    Default      Description
 ==============   ===========  ==============================================
@@ -744,24 +845,66 @@ Property Name    Default      Descriptio
 **type**         --           The component type name, needs to be ``syslogtcp``
 **host**         --           Host name or IP address to bind to
 **port**         --           Port # to bind to
-eventSize        2500
+eventSize        2500         Maximum size of a single event line, in bytes
 selector.type                 replicating or multiplexing
 selector.*       replicating  Depends on the selector.type value
 interceptors     --           Space separated list of interceptors
 interceptors.*
 ==============   ===========  ==============================================
 
+For example, a syslog TCP source for agent named a1:
+
+.. code-block:: properties
+
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = syslogtcp
+  a1.sources.r1.port = 5140
+  a1.sources.r1.host = localhost
+  a1.sources.r1.channels = c1
+
+Multiport Syslog TCP Source
+'''''''''''''''''''''''''''
+
+This is a newer, faster, multi-port capable version of the Syslog TCP source.
+Note that the ``ports`` configuration setting has replaced ``port``.
+Multi-port capability means that it can listen on many ports at once in an
+efficient manner. This source uses the Apache Mina library to do that.
+Provides support for RFC-3164 and many common RFC-5424 formatted messages.
+Also provides the capability to configure the character set used on a per-port
+basis.
+
+====================  ================  ==============================================
+Property Name         Default           Description
+====================  ================  ==============================================
+**channels**          --
+**type**              --                The component type name, needs to be ``multiport_syslogtcp``
+**host**              --                Host name or IP address to bind to.
+**ports**             --                Space-separated list (one or more) of ports to bind to.
+eventSize             2500              Maximum size of a single event line, in bytes.
+portHeader            --                If specified, the port number will be stored in the header of each event using the header name specified here. This allows for interceptors and channel selectors to customize routing logic based on the incoming port.
+charset.default       UTF-8             Default character set used while parsing syslog events into strings.
+charset.port.<port>   --                Character set is configurable on a per-port basis.
+batchSize             100               Maximum number of events to attempt to process per request loop. Using the default is usually fine.
+readBufferSize        1024              Size of the internal Mina read buffer. Provided for performance tuning. Using the default is usually fine.
+numProcessors         (auto-detected)   Number of processors available on the system for use while processing messages. Default is to auto-detect # of CPUs using the Java Runtime API. Mina will spawn 2 request-processing threads per detected CPU, which is often reasonable.
+selector.type         replicating       replicating, multiplexing, or custom
+selector.*            --                Depends on the ``selector.type`` value
+interceptors          --                Space separated list of interceptors.
+interceptors.*
+====================  ================  ==============================================
 
-For example, a syslog TCP source for agent named **agent_foo**:
+For example, a multiport syslog TCP source for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = syslogsource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.syslogsource-1.type = syslogtcp
-  agent_foo.sources.syslogsource-1.port = 5140
-  agent_foo.sources.syslogsource-1.host = localhost
-  agent_foo.sources.syslogsource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = multiport_syslogtcp
+  a1.sources.r1.channels = c1
+  a1.sources.r1.host = 0.0.0.0
+  a1.sources.r1.ports = 10001 10002 10003
+  a1.sources.r1.portHeader = port
 
 Syslog UDP Source
 '''''''''''''''''
@@ -780,17 +923,98 @@ interceptors.*
 ==============  ===========  ==============================================
 
 
-For example, a syslog UDP source for agent named **agent_foo**:
+For example, a syslog UDP source for agent named a1:
+
+.. code-block:: properties
+
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = syslogudp
+  a1.sources.r1.port = 5140
+  a1.sources.r1.host = localhost
+  a1.sources.r1.channels = c1
+
+HTTP Source
+~~~~~~~~~~~
+A source which accepts Flume Events by HTTP POST and GET. GET should be used
+for experimentation only. HTTP requests are converted into flume events by
+a pluggable "handler" which must implement the HTTPSourceHandler interface.
+This handler takes a HttpServletRequest and returns a list of
+flume events. All events handler from one Http request are committed to the channel
+in one transaction, thus allowing for increased efficiency on channels like
+the file channel. If the handler throws an exception this source will
+return a HTTP status of 400. If the channel is full, or the source is unable to
+append events to the channel, the source will return a HTTP 503 - Temporarily
+unavailable status.
+
+All events sent in one post request are considered to be one batch and
+inserted into the channel in one transaction.
+
+==============  ===========================================  ====================================================================
+Property Name   Default                                      Description
+==============  ===========================================  ====================================================================
+**type**                                                     The FQCN of this class:  ``org.apache.flume.source.http.HTTPSource``
+**port**        --                                           The port the source should bind to.
+handler         ``org.apache.flume.http.JSONHandler``        The FQCN of the handler class.
+handler.*       --                                           Config parameters for the handler
+selector.type   replicating                                  replicating or multiplexing
+selector.*                                                   Depends on the selector.type value
+interceptors    --                                           Space separated list of interceptors
+interceptors.*
+=================================================================================================================================
+
+For example, a http source for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = syslogsource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.syslogsource-1.type = syslogudp
-  agent_foo.sources.syslogsource-1.port = 5140
-  agent_foo.sources.syslogsource-1.host = localhost
-  agent_foo.sources.syslogsource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
+  a1.sources.r1.port = 5140
+  a1.sources.r1.channels = c1
+  a1.sources.r1.handler = org.example.rest.RestHandler
+  a1.sources.r1.handler.nickname = random props
+
+JSONHandler
+'''''''''''
+A handler is provided out of the box which can handle events represented in
+JSON format, and supports UTF-8, UTF-16 and UTF-32 character sets. The handler
+accepts an array of events (even if there is only one event, the event has to be
+sent in an array) and converts them to a Flume event based on the
+encoding specified in the request. If no encoding is specified, UTF-8 is assumed.
+The JSON handler supports UTF-8, UTF-16 and UTF-32.
+Events are represented as follows.
+
+.. code-block:: javascript
+
+  [{
+    "headers" : {
+               "timestamp" : "434324343",
+               "host" : "random_host.example.com"
+               },
+    "body" : "random_body"
+    },
+    {
+    "headers" : {
+               "namenode" : "namenode.example.com",
+               "datanode" : "random_datanode.example.com"
+               },
+    "body" : "really_random_body"
+    }]
+
+To set the charset, the request must have content type specified as
+``application/json; charset=UTF-8`` (replace UTF-8 with UTF-16 or UTF-32 as
+required).
+
+One way to create an event in the format expected by this handler, is to
+use JSONEvent provided in the Flume SDK and use Google Gson to create the JSON
+string using the Gson#fromJson(Object, Type)
+method. The type token to pass as the 2nd argument of this method
+for list of events can be created by:
 
+.. code-block:: java
+
+  Type type = new TypeToken<List<JSONEvent>>() {}.getType();
 
 Legacy Sources
 ~~~~~~~~~~~~~~
@@ -830,16 +1054,16 @@ interceptors    --           Space separ
 interceptors.*
 ==============  ===========  ========================================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = legacysource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.legacysource-1.type = org.apache.flume.source.avroLegacy.AvroLegacySource
-  agent_foo.sources.legacysource-1.host = 0.0.0.0
-  agent_foo.sources.legacysource-1.bind = 6666
-  agent_foo.sources.legacysource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = org.apache.flume.source.avroLegacy.AvroLegacySource
+  a1.sources.r1.host = 0.0.0.0
+  a1.sources.r1.bind = 6666
+  a1.sources.r1.channels = c1
 
 Thrift Legacy Source
 ''''''''''''''''''''
@@ -848,7 +1072,7 @@ Thrift Legacy Source
 Property Name   Default      Description
 ==============  ===========  ======================================================================================
 **channels**    --
-**type**        --           The component type name, needs to be ``org.apache.source.thriftLegacy.ThriftLegacySource``
+**type**        --           The component type name, needs to be ``org.apache.flume.source.thriftLegacy.ThriftLegacySource``
 **host**        --           The hostname or IP address to bind to
 **port**        --           The port # to listen on
 selector.type                replicating or multiplexing
@@ -857,16 +1081,16 @@ interceptors    --           Space separ
 interceptors.*
 ==============  ===========  ======================================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = legacysource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.legacysource-1.type = org.apache.source.thriftLegacy.ThriftLegacySource
-  agent_foo.sources.legacysource-1.host = 0.0.0.0
-  agent_foo.sources.legacysource-1.bind = 6666
-  agent_foo.sources.legacysource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = org.apache.flume.source.thriftLegacy.ThriftLegacySource
+  a1.sources.r1.host = 0.0.0.0
+  a1.sources.r1.bind = 6666
+  a1.sources.r1.channels = c1
 
 Custom Source
 ~~~~~~~~~~~~~
@@ -880,25 +1104,25 @@ Property Name   Default      Description
 ==============  ===========  ==============================================
 **channels**    --
 **type**        --           The component type name, needs to be your FQCN
-selector.type                replicating or multiplexing
+selector.type                ``replicating`` or ``multiplexing``
 selector.*      replicating  Depends on the selector.type value
 interceptors    --           Space separated list of interceptors
 interceptors.*
 ==============  ===========  ==============================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = legacysource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.legacysource-1.type = your.namespace.YourClass
-  agent_foo.sources.legacysource-1.channels = memoryChannel-1
-  
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = org.example.MySource
+  a1.sources.r1.channels = c1
+
 Scribe Source
 ~~~~~~~~~~~~~
 
-Scribe is another type of ingest system. To adopt existing Scribe ingest system, 
+Scribe is another type of ingest system. To adopt existing Scribe ingest system,
 Flume should use ScribeSource based on Thrift with compatible transfering protocol.
 The deployment of Scribe please following guide from Facebook.
 Required properties are in **bold**.
@@ -908,19 +1132,21 @@ Property Name   Default      Description
 ==============  ===========  ==============================================
 **type**        --           The component type name, needs to be ``org.apache.flume.source.scribe.ScribeSource``
 port            1499         Port that Scribe should be connected
-workerThreads   5			 Handing threads number in Thrift
+workerThreads   5            Handing threads number in Thrift
+selector.type
+selector.*
 ==============  ===========  ==============================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = scribesource-1
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sources.scribesource-1.type = org.apache.flume.source.scribe.ScribeSource
-  agent_foo.sources.scribesource-1.port = 1463
-  agent_foo.sources.scribesource-1.workerThreads = 5
-  agent_foo.sources.scribesource-1.channels = memoryChannel-1
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.type = org.apache.flume.source.scribe.ScribeSource
+  a1.sources.r1.port = 1463
+  a1.sources.r1.workerThreads = 5
+  a1.sources.r1.channels = c1
 
 Flume Sinks
 -----------
@@ -985,49 +1211,54 @@ Name                    Default       De
 **type**                --            The component type name, needs to be ``hdfs``
 **hdfs.path**           --            HDFS directory path (eg hdfs://namenode/flume/webdata/)
 hdfs.filePrefix         FlumeData     Name prefixed to files created by Flume in hdfs directory
+hdfs.fileSuffix         --            Suffix to append to file (eg ``.avro`` - *NOTE: period is not automatically added*)
 hdfs.rollInterval       30            Number of seconds to wait before rolling current file
                                       (0 = never roll based on time interval)
 hdfs.rollSize           1024          File size to trigger roll, in bytes (0: never roll based on file size)
 hdfs.rollCount          10            Number of events written to file before it rolled
                                       (0 = never roll based on number of events)
-hdfs.batchSize          1             number of events written to file before it flushed to HDFS
-hdfs.txnEventMax        100
+hdfs.idleTimeout        0             Timeout after which inactive files get closed
+                                      (0 = disable automatic closing of idle files)
+hdfs.batchSize          100           number of events written to file before it is flushed to HDFS
 hdfs.codeC              --            Compression codec. one of following : gzip, bzip2, lzo, snappy
 hdfs.fileType           SequenceFile  File format: currently ``SequenceFile``, ``DataStream`` or ``CompressedStream``
                                       (1)DataStream will not compress output file and please don't set codeC
                                       (2)CompressedStream requires set hdfs.codeC with an available codeC
-hdfs.maxOpenFiles       5000
+hdfs.maxOpenFiles       5000          Allow only this number of open files. If this number is exceeded, the oldest file is closed.
 hdfs.writeFormat        --            "Text" or "Writable"
-hdfs.appendTimeout      1000
-hdfs.callTimeout        10000
+hdfs.callTimeout        10000         Number of milliseconds allowed for HDFS operations, such as open, write, flush, close.
+                                      This number should be increased if many HDFS timeout operations are occurring.
 hdfs.threadsPoolSize    10            Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)
 hdfs.rollTimerPoolSize  1             Number of threads per HDFS sink for scheduling timed file rolling
 hdfs.kerberosPrincipal  --            Kerberos user principal for accessing secure HDFS
 hdfs.kerberosKeytab     --            Kerberos keytab for accessing secure HDFS
+hdfs.proxyUser
 hdfs.round              false         Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
 hdfs.roundValue         1             Rounded down to the highest multiple of this (in the unit configured using ``hdfs.roundUnit``), less than current time.
 hdfs.roundUnit          second        The unit of the round down value - ``second``, ``minute`` or ``hour``.
-serializer              ``TEXT``      Other possible options include ``AVRO_EVENT`` or the
+hdfs.timeZone           Local Time    Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles.
+serializer              ``TEXT``      Other possible options include ``avro_event`` or the
                                       fully-qualified class name of an implementation of the
                                       ``EventSerializer.Builder`` interface.
 serializer.*
 ======================  ============  ======================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = hdfsSink-1
-  agent_foo.sinks.hdfsSink-1.type = hdfs
-  agent_foo.sinks.hdfsSink-1.channels = memoryChannel-1
-  agent_foo.sinks.hdfsSink-1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
-  agent_foo.sinks.hdfsSink-1.hdfs.filePrefix = events-
-  agent_foo.sinks.hdfsSink-1.hdfs.round = true
-  agent_foo.sinks.hdfsSink-1.hdfs.roundValue = 10
-  agent_foo.sinks.hdfsSink-1.hdfs.roundUnit = minute
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = hdfs
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
+  a1.sinks.k1.hdfs.filePrefix = events-
+  a1.sinks.k1.hdfs.round = true
+  a1.sinks.k1.hdfs.roundValue = 10
+  a1.sinks.k1.hdfs.roundUnit = minute
 
-The above configuration will round down the timestamp to the last 10th minute. For example, an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become ``/flume/events/2012-06-12/1150/00``.
+The above configuration will round down the timestamp to the last 10th minute. For example, an event with
+timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become ``/flume/events/2012-06-12/1150/00``.
 
 
 Logger Sink
@@ -1043,14 +1274,14 @@ Property Name   Default  Description
 **type**        --       The component type name, needs to be ``logger``
 ==============  =======  ===========================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = loggerSink-1
-  agent_foo.sinks.loggerSink-1.type = logger
-  agent_foo.sinks.loggerSink-1.channels = memoryChannel-1
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = logger
+  a1.sinks.k1.channel = c1
 
 Avro Sink
 ~~~~~~~~~
@@ -1071,18 +1302,19 @@ Property Name    Default  Description
 batch-size       100      number of event to batch together for send.
 connect-timeout  20000    Amount of time (ms) to allow for the first (handshake) request.
 request-timeout  20000    Amount of time (ms) to allow for requests after the first.
+
 ===============  =======  ==============================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = avroSink-1
-  agent_foo.sinks.avroSink-1.type = avro
-  agent_foo.sinks.avroSink-1.channels = memoryChannel-1
-  agent_foo.sinks.avroSink-1.hostname = 10.10.10.10
-  agent_foo.sinks.avroSink-1.port = 4545
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = avro
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.hostname = 10.10.10.10
+  a1.sinks.k1.port = 4545
 
 IRC Sink
 ~~~~~~~~
@@ -1109,17 +1341,17 @@ splitchars       \n       line separator
                           backslash, like this: "\\n")
 ===============  =======  ========================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = ircSink-1
-  agent_foo.sinks.ircSink-1.type = irc
-  agent_foo.sinks.ircSink-1.channels = memoryChannel-1
-  agent_foo.sinks.ircSink-1.hostname = irc.yourdomain.com
-  agent_foo.sinks.ircSink-1.nick = flume
-  agent_foo.sinks.ircSink-1.chan = #flume
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = irc
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.hostname = irc.yourdomain.com
+  a1.sinks.k1.nick = flume
+  a1.sinks.k1.chan = #flume
 
 File Roll Sink
 ~~~~~~~~~~~~~~
@@ -1131,21 +1363,22 @@ Required properties are in **bold**.
 Property Name        Default  Description
 ===================  =======  ======================================================================================================================
 **channel**          --
-**type**             --       The component type name, needs to be ``FILE_ROLL``.
+**type**             --       The component type name, needs to be ``file_roll``.
 **sink.directory**   --       The directory where files will be stored
 sink.rollInterval    30       Roll the file every 30 seconds. Specifying 0 will disable rolling and cause all events to be written to a single file.
-sink.serializer      TEXT     Other possible options include AVRO_EVENT or the FQCN of an implementation of EventSerializer.Builder interface.
+sink.serializer      TEXT     Other possible options include ``avro_event`` or the FQCN of an implementation of EventSerializer.Builder interface.
+batchSize            100
 ===================  =======  ======================================================================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = fileSink-1
-  agent_foo.sinks.fileSink-1.type = FILE_ROLL
-  agent_foo.sinks.fileSink-1.channels = memoryChannel-1
-  agent_foo.sinks.fileSink-1.sink.directory = /var/log/flume
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = file_roll
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.sink.directory = /var/log/flume
 
 Null Sink
 ~~~~~~~~~
@@ -1157,17 +1390,18 @@ Required properties are in **bold**.
 Property Name  Default  Description
 =============  =======  ==============================================
 **channel**    --
-**type**       --       The component type name, needs to be ``NULL``.
+**type**       --       The component type name, needs to be ``null``.
+batchSize      100
 =============  =======  ==============================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = nullSink-1
-  agent_foo.sinks.nullSink-1.type = NULL
-  agent_foo.sinks.nullSink-1.channels = memoryChannel-1
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = null
+  a1.sinks.k1.channel = c1
 
 HBaseSinks
 ~~~~~~~~~~
@@ -1197,7 +1431,7 @@ Required properties are in **bold**.
 Property Name     Default                                                 Description
 ================  ======================================================  ========================================================================
 **channel**       --
-**type**          --                                                      The component type name, needs to be ``org.apache.flume.sink.HBaseSink``
+**type**          --                                                      The component type name, needs to be ``org.apache.flume.sink.hbase.HBaseSink``
 **table**         --                                                      The name of the table in Hbase to write to.
 **columnFamily**  --                                                      The column family in Hbase to write to.
 batchSize         100                                                     Number of events to be written per txn.
@@ -1205,17 +1439,17 @@ serializer        org.apache.flume.sink.
 serializer.*      --                                                      Properties to be passed to the serializer.
 ================  ======================================================  ========================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = hbaseSink-1
-  agent_foo.sinks.hbaseSink-1.type = org.apache.flume.sink.hbase.HBaseSink
-  agent_foo.sinks.hbaseSink-1.table = foo_table
-  agent_foo.sinks.hbaseSink-1.columnFamily = bar_cf
-  agent_foo.sinks.hbaseSink-1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
-  agent_foo.sinks.hbaseSink-1.channels = memoryChannel-1
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = org.apache.flume.sink.hbase.HBaseSink
+  a1.sinks.k1.table = foo_table
+  a1.sinks.k1.columnFamily = bar_cf
+  a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
+  a1.sinks.k1.channel = c1
 
 AsyncHBaseSink
 ''''''''''''''
@@ -1235,7 +1469,7 @@ Required properties are in **bold**.
 Property Name     Default                                                       Description
 ================  ============================================================  ====================================================================================
 **channel**       --
-**type**          --                                                            The component type name, needs to be ``org.apache.flume.sink.AsyncHBaseSink``
+**type**          --                                                            The component type name, needs to be ``org.apache.flume.sink.hbase.AsyncHBaseSink``
 **table**         --                                                            The name of the table in Hbase to write to.
 **columnFamily**  --                                                            The column family in Hbase to write to.
 batchSize         100                                                           Number of events to be written per txn.
@@ -1245,17 +1479,60 @@ serializer        org.apache.flume.sink.
 serializer.*      --                                                            Properties to be passed to the serializer.
 ================  ============================================================  ====================================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = hbaseSink-1
-  agent_foo.sinks.hbaseSink-1.type = org.apache.flume.sink.hbase.AsyncHBaseSink
-  agent_foo.sinks.hbaseSink-1.table = foo_table
-  agent_foo.sinks.hbaseSink-1.columnFamily = bar_cf
-  agent_foo.sinks.hbaseSink-1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
-  agent_foo.sinks.hbaseSink-1.channels = memoryChannel-1
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = org.apache.flume.sink.hbase.AsyncHBaseSink
+  a1.sinks.k1.table = foo_table
+  a1.sinks.k1.columnFamily = bar_cf
+  a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
+  a1.sinks.k1.channel = c1
+
+ElasticSearchSink
+'''''''''''''''''
+
+This sink writes data to ElasticSearch. A class implementing
+ElasticSearchEventSerializer which is specified by the configuration is used to convert the events into
+XContentBuilder which detail the fields and mappings which will be indexed. These are then then written
+to ElasticSearch. The sink will generate an index per day allowing easier management instead of dealing with
+a single large index
+The type is the FQCN: org.apache.flume.sink.elasticsearch.ElasticSearchSink
+Required properties are in **bold**.
+
+================  ==================================================================  =======================================================================================================
+Property Name     Default                                                             Description
+================  ==================================================================  =======================================================================================================
+**channel**       --
+**type**          --                                                                  The component type name, needs to be ``org.apache.flume.sink.elasticsearch.ElasticSearchSink``
+**hostNames**     --                                                                  Comma separated list of hostname:port, if the port is not present the default port '9300' will be used
+indexName         flume                                                               The name of the index which the date will be appended to. Example 'flume' -> 'flume-yyyy-MM-dd'
+indexType         logs                                                                The type to index the document to, defaults to 'log'
+clusterName       elasticsearch                                                       Name of the ElasticSearch cluster to connect to
+batchSize         100                                                                 Number of events to be written per txn.
+ttl               --                                                                  TTL in days, when set will cause the expired documents to be deleted automatically,
+                                                                                      if not set documents will never be automatically deleted
+serializer        org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
+serializer.*      --                                                                  Properties to be passed to the serializer.
+================  ==================================================================  =======================================================================================================
+
+Example for agent named a1:
+
+.. code-block:: properties
+
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
+  a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
+  a1.sinks.k1.indexName = foo_index
+  a1.sinks.k1.indexType = bar_type
+  a1.sinks.k1.clusterName = foobar_cluster
+  a1.sinks.k1.batchSize = 500
+  a1.sinks.k1.ttl = 5
+  a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
+  a1.sinks.k1.channel = c1
 
 Custom Sink
 ~~~~~~~~~~~
@@ -1272,14 +1549,14 @@ Property Name  Default  Description
 **type**       --       The component type name, needs to be your FQCN
 =============  =======  ==============================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.sinks = customSink-1
-  agent_foo.sinks.customSink-1.type = your.namespace.YourClass
-  agent_foo.sinks.customSink-1.channels = memoryChannel-1
+  a1.channels = c1
+  a1.sinks = k1 
+  a1.sinks.k1.type = org.example.MySink
+  a1.sinks.k1.channel = c1
 
 Flume Channels
 --------------
@@ -1304,13 +1581,13 @@ transactionCapacity  100      The max nu
 keep-alive           3        Timeout in seconds for adding or removing an event
 ===================  =======  ==============================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = memoryChannel-1
-  agent_foo.channels.memoryChannel-1.type = memory
-  agent_foo.channels.memoryChannel-1.capacity = 1000
+  a1.channels = c1
+  a1.channels.c1.type = memory
+  a1.channels.c1.capacity = 1000
 
 JDBC Channel
 ~~~~~~~~~~~~
@@ -1341,12 +1618,12 @@ sysprop.*                               
 sysprop.user.home                                                 Home path to store embedded Derby database
 ==========================  ====================================  =================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = jdbcChannel-1
-  agent_foo.channels.jdbcChannel-1.type = jdbc
+  a1.channels = c1
+  a1.channels.c1.type = jdbc
 
 Recoverable Memory Channel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1368,6 +1645,9 @@ wal.rollSize            (0x04000000)    
 wal.minRetentionPeriod  300000                                           Min amount of time (in millis) to keep a log
 wal.workerInterval      60000                                            How often (in millis) the background worker checks for old logs
 wal.maxLogsSize         (0x20000000)                                     Total amt (in bytes) of logs to keep, excluding the current log
+capacity                100
+transactionCapacity     100
+keep-alive              3
 ======================  ===============================================  =========================================================================
 
 
@@ -1376,19 +1656,29 @@ File Channel
 
 Required properties are in **bold**.
 
-====================  ================================  ========================================================
+================================================  ================================  ========================================================
 Property Name         Default                           Description
-====================  ================================  ========================================================
-**type**              --                                The component type name, needs to be ``FILE``.
-checkpointDir         ~/.flume/file-channel/checkpoint  The directory where checkpoint file will be stored
-dataDirs              ~/.flume/file-channel/data        The directory where log files will be stored
-transactionCapacity   1000                              The maximum size of transaction supported by the channel
-checkpointInterval    30000                             Amount of time (in millis) between checkpoints
-maxFileSize           2146435071                        Max size (in bytes) of a single log file
-capacity              1000000                           Maximum capacity of the channel
-keep-alive            3                                 Amount of time (in sec) to wait for a put operation
-write-timeout         3                                 Amount of time (in sec) to wait for a write operation
-====================  ================================  ========================================================
+================================================  ================================  ========================================================
+**type**                                          --                                The component type name, needs to be ``file``.
+checkpointDir                                     ~/.flume/file-channel/checkpoint  The directory where checkpoint file will be stored
+dataDirs                                          ~/.flume/file-channel/data        The directory where log files will be stored
+transactionCapacity                               1000                              The maximum size of transaction supported by the channel
+checkpointInterval                                30000                             Amount of time (in millis) between checkpoints
+maxFileSize                                       2146435071                        Max size (in bytes) of a single log file
+capacity                                          1000000                           Maximum capacity of the channel
+keep-alive                                        3                                 Amount of time (in sec) to wait for a put operation
+write-timeout                                     3                                 Amount of time (in sec) to wait for a write operation
+checkpoint-timeout                                600                               Expert: Amount of time (in sec) to wait for a checkpoint
+use-log-replay-v1                                 false                             Expert: Use old replay logic
+use-fast-replay                                   false                             Expert: Replay without using queue
+encryption.activeKey                              --                                Key name used to encrypt new data
+encryption.cipherProvider                         --                                Cipher provider type, supported types: AESCTRNOPADDING
+encryption.keyProvider                            --                                Key provider type, supported types: JCEKSFILE
+encryption.keyProvider.keyStoreFile               --                                Path to the keystore file
+encrpytion.keyProvider.keyStorePasswordFile       --                                Path to the keystore password file
+encryption.keyProvider.keys                       --                                List of all keys (e.g. history of the activeKey setting)
+encyption.keyProvider.keys.*.passwordFile         --                                Path to the optional key password file
+================================================  ================================  ========================================================
 
 .. note:: By default the File Channel uses paths for checkpoint and data
           directories that are within the user home as specified above.
@@ -1402,14 +1692,69 @@ write-timeout         3                 
           be necessary to provide good performance where multiple disks are
           not available for checkpoint and data directories.
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = fileChannel-1
-  agent_foo.channels.fileChannel-1.type = file
-  agent_foo.channels.fileChannel-1.checkpointDir = /mnt/flume/checkpoint
-  agent_foo.channels.fileChannel-1.dataDirs = /mnt/flume/data
+  a1.channels = c1
+  a1.channels.c1.type = file
+  a1.channels.c1.checkpointDir = /mnt/flume/checkpoint
+  a1.channels.c1.dataDirs = /mnt/flume/data
+
+**Encryption**
+
+Below is a few sample configurations:
+
+Generating a key with a password seperate from the key store password:
+
+.. code-block:: bash
+
+ keytool -genseckey -alias key-0 -keypass keyPassword -keyalg AES \
+   -keysize 128 -validity 9000 -keystore test.keystore \
+   -storetype jceks -storepass keyStorePassword
+
+Generating a key with the password the same as the key store password:      
+
+.. code-block:: bash
+
+  keytool -genseckey -alias key-1 -keyalg AES -keysize 128 -validity 9000 \
+    -keystore src/test/resources/test.keystore -storetype jceks \
+    -storepass keyStorePassword
+      
+
+.. code-block:: properties
+
+  a1.channels.c1.encryption.activeKey = key-0
+  a1.channels.c1.encryption.cipherProvider = AESCTRNOPADDING
+  a1.channels.c1.encryption.keyProvider = key-provider-0
+  a1.channels.c1.encryption.keyProvider = JCEKSFILE
+  a1.channels.c1.encryption.keyProvider.keyStoreFile = /path/to/my.keystore
+  a1.channels.c1.encryption.keyProvider.keyStorePasswordFile = /path/to/my.keystore.password
+  a1.channels.c1.encryption.keyProvider.keys = key-0
+
+Let's say you have aged key-0 out and new files should be encrypted with key-1:
+
+.. code-block:: properties
+
+  a1.channels.c1.encryption.activeKey = key-1
+  a1.channels.c1.encryption.cipherProvider = AESCTRNOPADDING
+  a1.channels.c1.encryption.keyProvider = JCEKSFILE
+  a1.channels.c1.encryption.keyProvider.keyStoreFile = /path/to/my.keystore
+  a1.channels.c1.encryption.keyProvider.keyStorePasswordFile = /path/to/my.keystore.password
+  a1.channels.c1.encryption.keyProvider.keys = key-0 key-1
+
+The same scenerio as above, however key-0 has it's own password:
+
+.. code-block:: properties
+
+  a1.channels.c1.encryption.activeKey = key-1
+  a1.channels.c1.encryption.cipherProvider = AESCTRNOPADDING
+  a1.channels.c1.encryption.keyProvider = JCEKSFILE
+  a1.channels.c1.encryption.keyProvider.keyStoreFile = /path/to/my.keystore
+  a1.channels.c1.encryption.keyProvider.keyStorePasswordFile = /path/to/my.keystore.password
+  a1.channels.c1.encryption.keyProvider.keys = key-0 key-1
+  a1.channels.c1.encryption.keyProvider.keys.key-0.passwordFile = /path/to/key-0.password
+
 
 Pseudo Transaction Channel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1440,15 +1785,15 @@ Required properties are in **bold**.
 =============  =======  =================================================================
 Property Name  Default  Description
 =============  =======  =================================================================
-**type**       --       The component type name, needs to be a fully-qualified class name
+**type**       --       The component type name, needs to be a FQCN
 =============  =======  =================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.channels = customChannel-1
-  agent_foo.channels.customChannel-1.type = your.domain.YourClass
+  a1.channels = c1
+  a1.channels.c1.type = org.example.MyChannel
 
 Flume Channel Selectors
 -----------------------
@@ -1466,14 +1811,14 @@ Property Name  Default      Description
 selector.type  replicating  The component type name, needs to be ``replicating``
 =============  ===========  ================================================
 
-Example for agent named **agent_foo** and it's source called **source_foo**:
+Example for agent named a1 and it's source called r1:
 
 .. code-block:: properties
 
-  agent_foo.sources = source_foo
-  agent_foo.channels = channel-1 channel-2 channel-3
-  agent_foo.source.source_foo.selector.type = replicating
-  agent_foo.source.source_foo.channels = channel-1 channel-2 channel-3
+  a1.sources = r1
+  a1.channels = c1 c2 c3
+  a1.source.r1.selector.type = replicating
+  a1.source.r1.channels = c1 c2 c3
 
 Multiplexing Channel Selector
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1489,17 +1834,17 @@ selector.default    --
 selector.mapping.*  --
 ==================  =====================  =================================================
 
-Example for agent named **agent_foo** and it's source called **source_foo**:
+Example for agent named a1 and it's source called r1:
 
 .. code-block:: properties
 
-  agent_foo.sources = source_foo
-  agent_foo.channels = channel-1 channel-2 channel-3 channel-4
-  agent_foo.sources.source_foo.selector.type = multiplexing
-  agent_foo.sources.source_foo.selector.header = state
-  agent_foo.sources.source_foo.selector.mapping.CZ = channel-1
-  agent_foo.sources.source_foo.selector.mapping.US = channel-2 channel-3
-  agent_foo.sources.source_foo.selector.default = channel-4
+  a1.sources = r1
+  a1.channels = c1 c2 c3 c4
+  a1.sources.r1.selector.type = multiplexing
+  a1.sources.r1.selector.header = state
+  a1.sources.r1.selector.mapping.CZ = c1
+  a1.sources.r1.selector.mapping.US = c2 c3
+  a1.sources.r1.selector.default = c4
 
 Custom Channel Selector
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -1515,13 +1860,13 @@ Property Name  Default  Description
 selector.type  --       The component type name, needs to be your FQCN
 =============  =======  ==============================================
 
-Example for agent named **agent_foo** and it's source called **source_foo**:
+Example for agent named a1 and it's source called r1:
 
 .. code-block:: properties
 
-  agent_foo.sources = source_foo
-  agent_foo.channels = channel-1
-  agent_foo.sources.source_foo.selector.type = your.namespace.YourClass
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.selector.type = org.example.MyChannelSelector
 
 Flume Sink Processors
 ---------------------
@@ -1536,18 +1881,18 @@ Required properties are in **bold**.
 ===================  ===========  =================================================================================
 Property Name        Default      Description
 ===================  ===========  =================================================================================
-**processor.sinks**  --           Space separated list of sinks that are participating in the group
+**sinks**            --           Space separated list of sinks that are participating in the group
 **processor.type**   ``default``  The component type name, needs to be ``default``, ``failover`` or ``load_balance``
 ===================  ===========  =================================================================================
 
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sinkgroups = group1
-  agent_foo.sinkgroups.group1.sinks = sink1 sink2
-  agent_foo.sinkgroups.group1.processor.type = load_balance
+  a1.sinkgroups = g1
+  a1.sinkgroups.g1.sinks = k1 k2
+  a1.sinkgroups.g1.processor.type = load_balance
 
 Default Sink Processor
 ~~~~~~~~~~~~~~~~~~~~~~
@@ -1578,22 +1923,22 @@ Required properties are in **bold**.
 =================================  ===========  ===================================================================================
 Property Name                      Default      Description
 =================================  ===========  ===================================================================================
-**processor.sinks**                --           Space separated list of sinks that are participating in the group
+**sinks**                          --           Space separated list of sinks that are participating in the group
 **processor.type**                 ``default``  The component type name, needs to be ``failover``
 **processor.priority.<sinkName>**  --             <sinkName> must be one of the sink instances associated with the current sink group
 processor.maxpenalty               30000        (in millis)
 =================================  ===========  ===================================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sinkgroups = group1
-  agent_foo.sinkgroups.group1.sinks = sink1 sink2
-  agent_foo.sinkgroups.group1.processor.type = failover
-  agent_foo.sinkgroups.group1.processor.priority.sink1 = 5
-  agent_foo.sinkgroups.group1.processor.priority.sink2 = 10
-  agent_foo.sinkgroups.group1.processor.maxpenalty = 10000
+  a1.sinkgroups = g1
+  a1.sinkgroups.g1.sinks = k1 k2
+  a1.sinkgroups.g1.processor.type = failover
+  a1.sinkgroups.g1.processor.priority.k1 = 5
+  a1.sinkgroups.g1.processor.priority.k2 = 10
+  a1.sinkgroups.g1.processor.maxpenalty = 10000
 
 
 Load balancing Sink Processor
@@ -1602,43 +1947,114 @@ Load balancing Sink Processor
 Load balancing sink processor provides the ability to load-balance flow over
 multiple sinks. It maintains an indexed list of active sinks on which the
 load must be distributed. Implementation supports distributing load using
-either via ``ROUND_ROBIN`` or via ``RANDOM`` selection mechanism. The choice
-of selection mechanism defaults to ``ROUND_ROBIN`` type, but can be overridden
-via configuration. Custom selection mechanisms are supported via custom
-classes that inherits from ``LoadBalancingSelector``.
+either via ``round_robin`` or ``random`` selection mechanisms.
+The choice of selection mechanism defaults to ``round_robin`` type,
+but can be overridden via configuration. Custom selection mechanisms are
+supported via custom classes that inherits from ``AbstractSinkSelector``.
 
 When invoked, this selector picks the next sink using its configured selection
-mechanism and invokes it. In case the selected sink fails to deliver the event,
-the processor picks the next available sink via its configured selection mechanism.
-This implementation does not blacklist the failing sink and instead continues
-to optimistically attempt every available sink. If all sinks invocations
-result in failure, the selector propagates the failure to the sink runner.
+mechanism and invokes it. For ``round_robin`` and ``random`` In case the selected sink
+fails to deliver the event, the processor picks the next available sink via
+its configured selection mechanism. This implementation does not blacklist
+the failing sink and instead continues to optimistically attempt every
+available sink. If all sinks invocations result in failure, the selector
+propagates the failure to the sink runner.
+
+If ``backoff`` is enabled, the sink processor will blacklist
+sinks that fail, removing them for selection for a given timeout. When the
+timeout ends, if the sink is still unresponsive timeout is increased
+exponentially to avoid potentially getting stuck in long waits on unresponsive
+sinks.
+
+
 
 Required properties are in **bold**.
 
-=============================  ===============  ===============================================================
-Property Name                  Default          Description
-=============================  ===============  ===============================================================
-**processor.sinks**            --               Space separated list of sinks that are participating in the group
-**processor.type**             ``default``      The component type name, needs to be ``load_balance``
-processor.selector             ``ROUND_ROBIN``  Selection mechanism. Must be either ``ROUND_ROBIN``, ``RANDOM``
-                                                or custom FQDN to class that inherits from ``LoadBalancingSelector``
-=============================  ===============  ===============================================================
+====================================  ===============  ==========================================================================
+Property Name                         Default          Description
+====================================  ===============  ==========================================================================
+**processor.sinks**                   --               Space separated list of sinks that are participating in the group
+**processor.type**                    ``default``      The component type name, needs to be ``load_balance``
+processor.backoff                     true             Should failed sinks be backed off exponentially.
+processor.selector                    ``round_robin``  Selection mechanism. Must be either ``round_robin``, ``random``
+                                                       or FQCN of custom class that inherits from ``AbstractSinkSelector``
+processor.selector.maxBackoffMillis   30000            used by backoff selectors to limit exponential backoff in miliseconds
+====================================  ===============  ==========================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sinkgroups = group1
-  agent_foo.sinkgroups.group1.sinks = sink1 sink2
-  agent_foo.sinkgroups.group1.processor.type = load_balance
-  agent_foo.sinkgroups.group1.processor.selector = random
+  a1.sinkgroups = g1
+  a1.sinkgroups.g1.sinks = k1 k2
+  a1.sinkgroups.g1.processor.type = load_balance
+  a1.sinkgroups.g1.processor.backoff = true
+  a1.sinkgroups.g1.processor.selector = random
+
 
 Custom Sink Processor
 ~~~~~~~~~~~~~~~~~~~~~
 
 Custom sink processors are not supported at the moment.
 
+Event Serializers
+-----------------
+
+The ``file_roll`` sink and the ``hdfs`` sink both support the
+``EventSerializer`` interface. Details of the EventSerializers that ship with
+Flume are provided below.
+
+Body Text Serializer
+~~~~~~~~~~~~~~~~~~~~
+
+Alias: ``text``. This interceptor writes the body of the event to an output
+stream without any transformation or modification. The event headers are
+ignored. Configuration options are as follows:
+
+=========================  ================  ===========================================================================
+Property Name              Default           Description
+=========================  ================  ===========================================================================
+appendNewline              true              Whether a newline will be appended to each event at write time. The default
+                                             of true assumes that events do not contain newlines, for legacy reasons.
+=========================  ================  ===========================================================================
+
+Example for agent named a1:
+
+.. code-block:: properties
+
+  a1.sinks = k1 
+  a1.sinks.k1.type = file_roll
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.sink.directory = /var/log/flume
+  a1.sinks.k1.sink.serializer = text
+  a1.sinks.k1.sink.serializer.appendNewline = false
+
+Avro Event Serializer
+~~~~~~~~~~~~~~~~~~~~~
+
+Alias: ``avro_event``. This interceptor serializes Flume events into an Avro
+container file. The schema used is the same schema used for Flume events
+in the Avro RPC mechanism. This serializers inherits from the
+``AbstractAvroEventSerializer`` class. Configuration options are as follows:
+
+==========================  ================  ===========================================================================
+Property Name               Default           Description
+==========================  ================  ===========================================================================
+syncIntervalBytes           2048000           Avro sync interval, in approximate bytes.
+compressionCodec            null              Avro compression codec. For supported codecs, see Avro's CodecFactory docs.
+==========================  ================  ===========================================================================
+
+Example for agent named a1:
+
+.. code-block:: properties
+
+  a1.sinks.k1.type = hdfs
+  a1.sinks.k1.channel = c1
+  a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
+  a1.sinks.k1.serializer = avro_event
+  a1.sinks.k1.serializer.compressionCodec = snappy
+
+
 Flume Interceptors
 ------------------
 
@@ -1655,18 +2071,23 @@ are named components, here is an example
 
 .. code-block:: properties
 
-  agent_foo.sources = source_foo
-  agent_foo.channels = channel-1
-  agent_foo.sources.source_foo.interceptors = a b
-  agent_foo.sources.source_foo.interceptors.a.type = org.apache.flume.interceptor.HostInterceptor$Builder
-  agent_foo.sources.source_foo.interceptors.a.preserveExisting = false
-  agent_foo.sources.source_foo.interceptors.a.hostHeader = hostname
-  agent_foo.sources.source_foo.interceptors.b.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
+  a1.sources = r1
+  a1.sinks = k1 
+  a1.channels = c1
+  a1.sources.r1.interceptors = i1 i2
+  a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Builder
+  a1.sources.r1.interceptors.i1.preserveExisting = false
+  a1.sources.r1.interceptors.i1.hostHeader = hostname
+  a1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
+  a1.sinks.k1.filePrefix = FlumeData.%{CollectorHost}.%Y-%m-%d
+  a1.sinks.k1.channel = c1
 
 Note that the interceptor builders are passed to the type config parameter. The interceptors are themselves
 configurable and can be passed configuration values just like they are passed to any other configurable component.
 In the above example, events are passed to the HostInterceptor first and the events returned by the HostInterceptor
-are then passed along to the TimestampInterceptor.
+are then passed along to the TimestampInterceptor. You can specify either the fully qualified class name (FQCN)
+or the alias ``timestamp``. If you have multiple collectors writing to the same HDFS path then you could also use
+the HostInterceptor.
 
 Timestamp Interceptor
 ~~~~~~~~~~~~~~~~~~~~~
@@ -1678,20 +2099,20 @@ can preserve an existing timestamp if it
 ================  =======  ========================================================================
 Property Name     Default  Description
 ================  =======  ========================================================================
-**type**          --       The component type name, has to be ``TIMESTAMP``
+**type**          --       The component type name, has to be ``timestamp`` or the FQCN
 preserveExisting  false    If the timestamp already exists, should it be preserved - true or false
 ================  =======  ========================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = source1
-  agent_foo.channels = channel1
-  agent_foo.sources.source1.channels =  channel1
-  agent_foo.sources.source1.type = SEQ
-  agent_foo.sources.source1.interceptors = inter1
-  agent_foo.sources.source1.interceptors.inter1.type = timestamp
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.channels =  c1
+  a1.sources.r1.type = seq
+  a1.sources.r1.interceptors = i1
+  a1.sources.r1.interceptors.i1.type = timestamp
 
 Host Interceptor
 ~~~~~~~~~~~~~~~~
@@ -1702,21 +2123,21 @@ with key ``host`` or a configured key wh
 ================  =======  ========================================================================
 Property Name     Default  Description
 ================  =======  ========================================================================
-**type**          --       The component type name, has to be ``HOST``
+**type**          --       The component type name, has to be ``host``
 preserveExisting  false    If the host header already exists, should it be preserved - true or false
 useIP             true     Use the IP Address if true, else use hostname.
 hostHeader        host     The header key to be used.
 ================  =======  ========================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = source_foo
-  agent_foo.channels = channel-1
-  agent_foo.sources.source_foo.interceptors = host_int
-  agent_foo.sources.source_foo.interceptors.host_int.type = host
-  agent_foo.sources.source_foo.interceptors.host_int.hostHeader = hostname
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.interceptors = i1
+  a1.sources.r1.interceptors.i1.type = host
+  a1.sources.r1.interceptors.i1.hostHeader = hostname
 
 Static Interceptor
 ~~~~~~~~~~~~~~~~~~
@@ -1729,38 +2150,99 @@ multiple static interceptors each defini
 ================  =======  ========================================================================
 Property Name     Default  Description
 ================  =======  ========================================================================
-**type**          --       The component type name, has to be ``STATIC``
+**type**          --       The component type name, has to be ``static``
 preserveExisting  true     If configured header already exists, should it be preserved - true or false
 key               key      Name of header that should be created
 value             value    Static value that should be created
 ================  =======  ========================================================================
 
-Example for agent named **agent_foo**:
+Example for agent named a1:
 
 .. code-block:: properties
 
-  agent_foo.sources = source1
-  agent_foo.channels = channel1
-  agent_foo.sources.source1.channels =  channel1
-  agent_foo.sources.source1.type = SEQ
-  agent_foo.sources.source1.interceptors = inter1
-  agent_foo.sources.source1.interceptors.inter1.type = static
-  agent_foo.sources.source1.interceptors.inter1.key = datacenter
-  agent_foo.sources.source1.interceptors.inter1.value = NEW_YORK
+  a1.sources = r1
+  a1.channels = c1
+  a1.sources.r1.channels =  c1
+  a1.sources.r1.type = seq
+  a1.sources.r1.interceptors = i1
+  a1.sources.r1.interceptors.i1.type = static
+  a1.sources.r1.interceptors.i1.key = datacenter
+  a1.sources.r1.interceptors.i1.value = NEW_YORK
 
 Regex Filtering Interceptor
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression. The supplied regular expression can be used to include events or exclude events.
+This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression.
+The supplied regular expression can be used to include events or exclude events.
 
 ================  =======  ========================================================================
 Property Name     Default  Description
 ================  =======  ========================================================================
-**type**          --       The component type name has to be ``REGEX_FILTER``
+**type**          --       The component type name has to be ``regex_filter``
 regex             ".*"     Regular expression for matching against events
-excludeRegex      false    If true, regex determines events to exclude, otherwise regex determines events to include.
+excludeEvents     false    If true, regex determines events to exclude, otherwise regex determines
+                           events to include.
 ================  =======  ========================================================================
 
+Regex Extractor Interceptor
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This interceptor extracts regex match groups using a specified regular expression and appends the match groups as headers on the event.
+It also supports pluggable serializers for formatting the match groups before adding them as event headers.
+
+================================ ========== =================================================================================================
+Property Name                    Default                        Description
+================================ ========== =================================================================================================
+**type**                         --         The component type name has to be ``regex_extractor``
+**regex**                        --         Regular expression for matching against events
+**serializers**                  --         Space-separated list of serializers for mapping matches to header names and serializing their
+                                            values. (See example below)
+                                            Flume provides built-in support for the following serializers:
+                                            ``org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer``
+                                            ``org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer``
+serializers.<s1>.type            default    Must be ``default`` (org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer),
+                                            ``org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer``,
+                                            or the FQCN of a custom class that implements ``org.apache.flume.interceptor.RegexExtractorInterceptorSerializer``
+serializers.<s1>.\ **name**      --
+serializers.*                    --         Serializer-specific properties
+================================ ========== =================================================================================================
+
+The serializers are used to map the matches to a header name and a formatted header value, by default you only need to specify
+the header name and the default ``org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer`` will be used. 
+This serializer simply maps the matches to the specified header name and passes the value through as it was extracted by the regex. 
+You can plug custom serializer implementations into the extractor using the fully qualified class name (FQCN) to format the matches
+in anyway you like.
+
+Example 1:
+~~~~~~~~~~
+
+If the Flume event body contained ``1:2:3.4foobar5`` and the following configuration was used
+
+.. code-block:: properties
+
+  agent.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)
+  agent.sources.r1.interceptors.i1.serializers = s1 s2 s3
+  agent.sources.r1.interceptors.i1.serializers.s1.name = one
+  agent.sources.r1.interceptors.i1.serializers.s2.name = two
+  agent.sources.r1.interceptors.i1.serializers.s3.name = three
+
+The extracted event will contain the same body but the following headers will have been added ``one=>1, two=>2, three=>3``
+
+Example 2:
+~~~~~~~~~~
+
+If the Flume event body contained ``2012-10-18 18:47:57,614 some log line`` and the following configuration was used
+
+.. code-block:: properties
+
+  agent.sources.r1.interceptors.i1.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d)
+  agent.sources.r1.interceptors.i1.serializers = s1
+  agent.sources.r1.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
+  agent.sources.r1.interceptors.i1.serializers.s1.name = timestamp
+  agent.sources.r1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm
+
+the extracted event will contain the same body but the following headers will have been added ``timestamp=>1350611220000``
+
 Flume Properties
 ----------------
 
@@ -1798,7 +2280,7 @@ Log4J Appender
 
 Appends Log4j events to a flume agent's avro source. A client using this
 appender must have the flume-ng-sdk in the classpath (eg,
-flume-ng-sdk-1.3.0-SNAPSHOT.jar).
+flume-ng-sdk-1.3.0.jar).
 Required properties are in **bold**.
 
 =============  =======  ==========================================================================
@@ -1848,7 +2330,7 @@ and can be specified in the flume-env.sh
 =======================  =======  =====================================================================================
 Property Name            Default  Description
 =======================  =======  =====================================================================================
-**type**                 --       The component type name, has to be ``GANGLIA``
+**type**                 --       The component type name, has to be ``ganglia``
 **hosts**                --       Comma separated list of ``hostname:port``
 pollInterval             60       Time, in seconds, between consecutive reporting to ganglia server
 isGanglia3               false    Ganglia server version is 3. By default, Flume sends in ganglia 3.1 format
@@ -1856,18 +2338,7 @@ isGanglia3               false    Gangli
 
 We can start Flume with Ganglia support as follows::
 
-  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.monitoring.type=GANGLIA -Dflume.monitoring.hosts=com.example:1234,com.example2:5455
-
-Any custom flume components should use Java MBean ObjectNames which begin
-with ``org.apache.flume`` for Flume to report the metrics to Ganglia. This can
-be done by adding the ObjectName as follows(the name can be anything provided it
-starts with ``org.apache.flume``):
-
-.. code-block:: java
-
-  ObjectName objName = new ObjectName("org.apache.flume." + myClassName + ":type=" + name);
-
-  ManagementFactory.getPlatformMBeanServer().registerMBean(this, objName);
+  $ bin/flume-ng agent --conf-file example.conf --name a1 -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=com.example:1234,com.example2:5455
 
 JSON Reporting
 --------------
@@ -1909,13 +2380,13 @@ Here is an example:
 =======================  =======  =====================================================================================
 Property Name            Default  Description
 =======================  =======  =====================================================================================
-**type**                 --       The component type name, has to be ``HTTP``
+**type**                 --       The component type name, has to be ``http``
 port                     41414    The port to start the server on.
 =======================  =======  =====================================================================================
 
 We can start Flume with Ganglia support as follows::
 
-  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545
+  $ bin/flume-ng agent --conf-file example.conf --name a1 -Dflume.monitoring.type=http -Dflume.monitoring.port=34545
 
 Metrics will then be available at **http://<hostname>:<port>/metrics** webpage.
 Custom components can report metrics as mentioned in the Ganglia section above.
@@ -1929,7 +2400,7 @@ the same way the GangliaServer is used f
 mbean server to poll the mbeans for metrics. For example, if an HTTP
 monitoring service called ``HTTPReporting`` can be used as follows::
 
-  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.monitoring.type=com.example.reporting.HTTPReporting -Dflume.monitoring.node=com.example:332
+  $ bin/flume-ng agent --conf-file example.conf --name a1 -Dflume.monitoring.type=com.example.reporting.HTTPReporting -Dflume.monitoring.node=com.example:332
 
 =======================  =======  ========================================
 Property Name            Default  Description
@@ -1937,7 +2408,158 @@ Property Name            Default  Descri
 **type**                 --       The component type name, has to be FQCN

[... 284 lines stripped ...]


Mime
View raw message