flume-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rgo...@apache.org
Subject svn commit: r1376696 - /flume/site/trunk/content/sphinx/FlumeUserGuide.rst
Date Thu, 23 Aug 2012 21:01:05 GMT
Author: rgoers
Date: Thu Aug 23 21:01:04 2012
New Revision: 1376696

URL: http://svn.apache.org/viewvc?rev=1376696&view=rev
Log:
Keep user guide in synch

Modified:
    flume/site/trunk/content/sphinx/FlumeUserGuide.rst

Modified: flume/site/trunk/content/sphinx/FlumeUserGuide.rst
URL: http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/FlumeUserGuide.rst?rev=1376696&r1=1376695&r2=1376696&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/FlumeUserGuide.rst (original)
+++ flume/site/trunk/content/sphinx/FlumeUserGuide.rst Thu Aug 23 21:01:04 2012
@@ -30,7 +30,7 @@ different sources to a centralized data 
 
 Apache Flume is a top level project at the Apache Software Foundation.
 There are currently two release code lines available, versions 0.9.x and 1.x.
-This documentation applies to the 1.x codeline.  
+This documentation applies to the 1.x codeline.
 Please click here for
 `the Flume 0.9.x User Guide <http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
 
@@ -155,7 +155,7 @@ A simple example
 Here, we give an example configuration file, describing a single-node Flume deployment. This
configuration lets a user generate events and subsequently logs them to the console.
 
 .. code-block:: properties
-   
+
   # example.conf: A single-node Flume configuration
 
   # Name the components on this agent
@@ -175,7 +175,7 @@ Here, we give an example configuration f
   agent1.channels.channel1.type = memory
   agent1.channels.channel1.capacity = 1000
   agent1.channels.channel1.transactionCapactiy = 100
- 
+
   # Bind the source and sink to the channel
   agent1.sources.source1.channels = channel1
   agent1.sinks.sink1.channel = channel1
@@ -643,7 +643,7 @@ interceptors.*
              of indicating to the application writing the log file that it needs to
              retain the log or that the event hasn't been sent, for some reason. If
              this doesn't make sense, you need only know this: Your application can
-             never guarantee data has been received when using a unidirectional 
+             never guarantee data has been received when using a unidirectional
              asynchronous interface such as ExecSource! As an extension of this
              warning - and to be completely clear - there is absolutely zero guarantee
              of event delivery when using this source. You have been warned.
@@ -894,6 +894,33 @@ Example for agent named **agent_foo**:
   agent_foo.channels = memoryChannel-1
   agent_foo.sources.legacysource-1.type = your.namespace.YourClass
   agent_foo.sources.legacysource-1.channels = memoryChannel-1
+  
+Scribe Source
+~~~~~~~~~~~~~
+
+Scribe is another type of ingest system. To adopt existing Scribe ingest system, 
+Flume should use ScribeSource based on Thrift with compatible transfering protocol.
+The deployment of Scribe please following guide from Facebook.
+Required properties are in **bold**.
+
+==============  ===========  ==============================================
+Property Name   Default      Description
+==============  ===========  ==============================================
+**type**        --           The component type name, needs to be ``org.apache.flume.source.scribe.ScribeSource``
+port            1499         Port that Scribe should be connected
+workerThreads   5			 Handing threads number in Thrift
+==============  ===========  ==============================================
+
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = scribesource-1
+  agent_foo.channels = memoryChannel-1
+  agent_foo.sources.scribesource-1.type = org.apache.flume.source.scribe.ScribeSource
+  agent_foo.sources.scribesource-1.port = 1463
+  agent_foo.sources.scribesource-1.workerThreads = 5
+  agent_foo.sources.scribesource-1.channels = memoryChannel-1
 
 Flume Sinks
 -----------
@@ -1100,15 +1127,15 @@ File Roll Sink
 Stores events on the local filesystem.
 Required properties are in **bold**.
 
-=================  =======  ======================================================================================================================
-Property Name      Default  Description
-=================  =======  ======================================================================================================================
-**channel**        --
-**type**           --       The component type name, needs to be ``FILE_ROLL``.
-sink.directory     --
-sink.rollInterval  30       Roll the file every 30 seconds. Specifying 0 will disable rolling
and cause all events to be written to a single file.
-sink.serializer    TEXT     Other possible options include AVRO_EVENT or the FQCN of an implementation
of EventSerializer.Builder interface.
-=================  =======  ======================================================================================================================
+===================  =======  ======================================================================================================================
+Property Name        Default  Description
+===================  =======  ======================================================================================================================
+**channel**          --
+**type**             --       The component type name, needs to be ``FILE_ROLL``.
+**sink.directory**   --       The directory where files will be stored
+sink.rollInterval    30       Roll the file every 30 seconds. Specifying 0 will disable rolling
and cause all events to be written to a single file.
+sink.serializer      TEXT     Other possible options include AVRO_EVENT or the FQCN of an
implementation of EventSerializer.Builder interface.
+===================  =======  ======================================================================================================================
 
 Example for agent named **agent_foo**:
 
@@ -1204,17 +1231,19 @@ This sink is still experimental.
 The type is the FQCN: org.apache.flume.sink.hbase.AsyncHBaseSink.
 Required properties are in **bold**.
 
-================  ============================================================  =============================================================================
+================  ============================================================  ====================================================================================
 Property Name     Default                                                       Description
-================  ============================================================  =============================================================================
+================  ============================================================  ====================================================================================
 **channel**       --
 **type**          --                                                            The component
type name, needs to be ``org.apache.flume.sink.AsyncHBaseSink``
 **table**         --                                                            The name
of the table in Hbase to write to.
 **columnFamily**  --                                                            The column
family in Hbase to write to.
 batchSize         100                                                           Number of
events to be written per txn.
+timeout           --                                                            The length
of time (in milliseconds) the sink waits for acks from hbase for
+                                                                                all events
in a transaction. If no timeout is specified, the sink will wait forever.
 serializer        org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
 serializer.*      --                                                            Properties
to be passed to the serializer.
-================  ============================================================  =============================================================================
+================  ============================================================  ====================================================================================
 
 Example for agent named **agent_foo**:
 
@@ -1361,8 +1390,8 @@ keep-alive            3                 
 write-timeout         3                                 Amount of time (in sec) to wait for
a write operation
 ====================  ================================  ========================================================
 
-.. note:: By default the File Channel uses paths for checkpoint and data 
-          directories that are within the user home as specified above. 
+.. note:: By default the File Channel uses paths for checkpoint and data
+          directories that are within the user home as specified above.
           As a result if you have more than one File Channel instances
           active within the agent, only one will be able to lock the
           directories and cause the other channel initialization to fail.
@@ -1649,10 +1678,21 @@ can preserve an existing timestamp if it
 ================  =======  ========================================================================
 Property Name     Default  Description
 ================  =======  ========================================================================
-type              --       The component type name, has to be ``TIMESTAMP``
+**type**          --       The component type name, has to be ``TIMESTAMP``
 preserveExisting  false    If the timestamp already exists, should it be preserved - true
or false
 ================  =======  ========================================================================
 
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = source1
+  agent_foo.channels = channel1
+  agent_foo.sources.source1.channels =  channel1
+  agent_foo.sources.source1.type = SEQ
+  agent_foo.sources.source1.interceptors = inter1
+  agent_foo.sources.source1.interceptors.inter1.type = timestamp
+
 Host Interceptor
 ~~~~~~~~~~~~~~~~
 
@@ -1662,14 +1702,64 @@ with key ``host`` or a configured key wh
 ================  =======  ========================================================================
 Property Name     Default  Description
 ================  =======  ========================================================================
-type              --       The component type name, has to be ``HOST``
+**type**          --       The component type name, has to be ``HOST``
 preserveExisting  false    If the host header already exists, should it be preserved - true
or false
 useIP             true     Use the IP Address if true, else use hostname.
 hostHeader        host     The header key to be used.
 ================  =======  ========================================================================
 
-In the example above, the key used in the event headers is "hostname"
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = source_foo
+  agent_foo.channels = channel-1
+  agent_foo.sources.source_foo.interceptors = host_int
+  agent_foo.sources.source_foo.interceptors.host_int.type = host
+  agent_foo.sources.source_foo.interceptors.host_int.hostHeader = hostname
+
+Static Interceptor
+~~~~~~~~~~~~~~~~~~
+
+Static interceptor allows user to append a static header with static value to all events.
+
+The current implementation does not allow specifying multiple headers at one time. Instead
user might chain
+multiple static interceptors each defining one static header.
+
+================  =======  ========================================================================
+Property Name     Default  Description
+================  =======  ========================================================================
+**type**          --       The component type name, has to be ``STATIC``
+preserveExisting  true     If configured header already exists, should it be preserved -
true or false
+key               key      Name of header that should be created
+value             value    Static value that should be created
+================  =======  ========================================================================
+
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = source1
+  agent_foo.channels = channel1
+  agent_foo.sources.source1.channels =  channel1
+  agent_foo.sources.source1.type = SEQ
+  agent_foo.sources.source1.interceptors = inter1
+  agent_foo.sources.source1.interceptors.inter1.type = static
+  agent_foo.sources.source1.interceptors.inter1.key = datacenter
+  agent_foo.sources.source1.interceptors.inter1.value = NEW_YORK
+
+Regex Filtering Interceptor
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This interceptor filters events selectively by interpreting the event body as text and matching
the text against a configured regular expression. The supplied regular expression can be used
to include events or exclude events.
 
+================  =======  ========================================================================
+Property Name     Default  Description
+================  =======  ========================================================================
+**type**          --       The component type name has to be ``REGEX_FILTER``
+regex             ".*"     Regular expression for matching against events
+excludeRegex      false    If true, regex determines events to exclude, otherwise regex determines
events to include.
+================  =======  ========================================================================
 
 Flume Properties
 ----------------
@@ -1685,7 +1775,6 @@ flume.called.from.service  --       If t
                                     -Dflume.called.from.service is enough)
 =========================  =======  ====================================================================
 
-
 Property: flume.called.from.service
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -1744,7 +1833,111 @@ configuring the HDFS sink Kerberos-relat
 Monitoring
 ==========
 
-TBD
+Monitoring in Flume is still a work in progress. Changes can happen very often.
+Several Flume components report metrics to the JMX platform MBean server. These
+metrics can be queried using Jconsole.
+
+Ganglia Reporting
+-----------------
+Flume can also report these metrics to
+Ganglia 3 or Ganglia 3.1 metanodes. To report metrics to Ganglia, a flume agent
+must be started with this support. The Flume agent has to be started by passing
+in the following parameters as system properties prefixed by ``flume.monitoring.``,
+and can be specified in the flume-env.sh:
+
+=======================  =======  =====================================================================================
+Property Name            Default  Description
+=======================  =======  =====================================================================================
+**type**                 --       The component type name, has to be ``GANGLIA``
+**hosts**                --       Comma separated list of ``hostname:port``
+pollInterval             60       Time, in seconds, between consecutive reporting to ganglia
server
+isGanglia3               false    Ganglia server version is 3. By default, Flume sends in
ganglia 3.1 format
+=======================  =======  =====================================================================================
+
+We can start Flume with Ganglia support as follows::
+
+  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.monitoring.type=GANGLIA
-Dflume.monitoring.hosts=com.example:1234,com.example2:5455
+
+Any custom flume components should use Java MBean ObjectNames which begin
+with ``org.apache.flume`` for Flume to report the metrics to Ganglia. This can
+be done by adding the ObjectName as follows(the name can be anything provided it
+starts with ``org.apache.flume``):
+
+.. code-block:: java
+
+  ObjectName objName = new ObjectName("org.apache.flume." + myClassName + ":type=" + name);
+
+  ManagementFactory.getPlatformMBeanServer().registerMBean(this, objName);
+
+JSON Reporting
+--------------
+Flume can also report metrics in a JSON format. To enable reporting in JSON format, Flume
hosts
+a Web server on a configurable port. Flume reports metrics in the following JSON format:
+
+.. code-block:: java
+
+  {
+  "typeName1.componentName1" : {"metric1" : "metricValue1", "metric2" : "metricValue2"},
+  "typeName2.componentName2" : {"metric3" : "metricValue3", "metric4" : "metricValue4"}
+  }
+
+Here is an example:
+
+.. code-block:: java
+
+  {
+  "CHANNEL.fileChannel":{"EventPutSuccessCount":"468085",
+                        "Type":"CHANNEL",
+                        "StopTime":"0",
+                        "EventPutAttemptCount":"468086",
+                        "ChannelSize":"233428",
+                        "StartTime":"1344882233070",
+                        "EventTakeSuccessCount":"458200",
+                        "ChannelCapacity":"600000",
+                        "EventTakeAttemptCount":"458288"},
+  "CHANNEL.memChannel":{"EventPutSuccessCount":"22948908",
+                     "Type":"CHANNEL",
+                     "StopTime":"0",
+                     "EventPutAttemptCount":"22948908",
+                     "ChannelSize":"5",
+                     "StartTime":"1344882209413",
+                     "EventTakeSuccessCount":"22948900",
+                     "ChannelCapacity":"100",
+                     "EventTakeAttemptCount":"22948908"}
+  }
+
+=======================  =======  =====================================================================================
+Property Name            Default  Description
+=======================  =======  =====================================================================================
+**type**                 --       The component type name, has to be ``HTTP``
+port                     41414    The port to start the server on.
+=======================  =======  =====================================================================================
+
+We can start Flume with Ganglia support as follows::
+
+  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.monitoring.type=HTTP
-Dflume.monitoring.port=34545
+
+Metrics will then be available at **http://<hostname>:<port>/metrics** webpage.
+Custom components can report metrics as mentioned in the Ganglia section above.
+
+Custom Reporting
+----------------
+It is possible to report metrics to other systems by writing servers that do
+the reporting. Any reporting class has to implement the interface,
+``org.apache.flume.instrumentation.MonitorService``. Such a class can be used
+the same way the GangliaServer is used for reporting. They can poll the platform
+mbean server to poll the mbeans for metrics. For example, if an HTTP
+monitoring service called ``HTTPReporting`` can be used as follows::
+
+  $ bin/flume-ng agent --conf-file example.conf --name agent1 -Dflume.monitoring.type=com.example.reporting.HTTPReporting
-Dflume.monitoring.node=com.example:332
+
+=======================  =======  ========================================
+Property Name            Default  Description
+=======================  =======  ========================================
+**type**                 --       The component type name, has to be FQCN
+=======================  =======  ========================================
+
+
 
 Troubleshooting
 ===============
@@ -1791,37 +1984,40 @@ TBD
 Component Summary
 =================
 
-================================  ==================  ====================================================================
-Component Interface               Type                Implementation Class
-================================  ==================  ====================================================================
-org.apache.flume.Channel          MEMORY              org.apache.flume.channel.MemoryChannel
-org.apache.flume.Channel          JDBC                org.apache.flume.channel.jdbc.JdbcChannel
-org.apache.flume.Channel          --                  org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
-org.apache.flume.Channel          FILE                org.apache.flume.channel.file.FileChannel
-org.apache.flume.Channel          --                  org.apache.flume.channel.PseudoTxnMemoryChannel
-org.apache.flume.Channel          --                  org.example.MyChannel
-org.apache.flume.Source           AVRO
-org.apache.flume.Source           NETCAT
-org.apache.flume.Source           SEQ
-org.apache.flume.Source           EXEC
-org.apache.flume.Source           SYSLOGTCP
-org.apache.flume.Source           SYSLOGUDP
-org.apache.flume.Source           --                  org.apache.flume.source.avroLegacy.AvroLegacySource
-org.apache.flume.Source           --                  org.apache.flume.source.thriftLegacy.ThriftLegacySource
-org.apache.flume.Source           --                  org.example.MySource
-org.apache.flume.Sink             NULL                org.apache.flume.sink.NullSink
-org.apache.flume.Sink             LOGGER              org.apache.flume.sink.LoggerSink
-org.apache.flume.Sink             AVRO                org.apache.flume.sink.AvroSink
-org.apache.flume.Sink             HDFS                org.apache.flume.sink.hdfs.HDFSEventSink
-org.apache.flume.Sink             --                  org.apache.flume.sink.hbase.HBaseSink
-org.apache.flume.Sink             --                  org.apache.flume.sink.hbase.AsyncHBaseSink
-org.apache.flume.Sink             FILE_ROLL           org.apache.flume.sink.RollingFileSink
-org.apache.flume.Sink             IRC                 org.apache.flume.sink.irc.IRCSink
-org.apache.flume.Sink             --                  org.example.MySink
-org.apache.flume.ChannelSelector  REPLICATING         org.apache.flume.channel.ReplicatingChannelSelector
-org.apache.flume.ChannelSelector  MULTIPLEXING        org.apache.flume.channel.MultiplexingChannelSelector
-org.apache.flume.ChannelSelector  --                  org.example.MyChannelSelector
-org.apache.flume.SinkProcessor    DEFAULT             org.apache.flume.sink.DefaultSinkProcessor
-org.apache.flume.SinkProcessor    FAILOVER            org.apache.flume.sink.FailoverSinkProcessor
-org.apache.flume.SinkProcessor    LOAD_BALANCE        org.apache.flume.sink.LoadBalancingSinkProcessor
-================================  ==================  ====================================================================
+========================================  ==================  ====================================================================
+Component Interface                       Type                Implementation Class
+========================================  ==================  ====================================================================
+org.apache.flume.Channel                  MEMORY              org.apache.flume.channel.MemoryChannel
+org.apache.flume.Channel                  JDBC                org.apache.flume.channel.jdbc.JdbcChannel
+org.apache.flume.Channel                  --                  org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
+org.apache.flume.Channel                  FILE                org.apache.flume.channel.file.FileChannel
+org.apache.flume.Channel                  --                  org.apache.flume.channel.PseudoTxnMemoryChannel
+org.apache.flume.Channel                  --                  org.example.MyChannel
+org.apache.flume.Source                   AVRO                org.apache.flume.source.AvroSource
+org.apache.flume.Source                   NETCAT              org.apache.flume.source.NetcatSource
+org.apache.flume.Source                   SEQ                 org.apache.flume.source.SequenceGeneratorSource
+org.apache.flume.Source                   EXEC                org.apache.flume.source.ExecSource
+org.apache.flume.Source                   SYSLOGTCP           org.apache.flume.source.SyslogTcpSource
+org.apache.flume.Source                   SYSLOGUDP           org.apache.flume.source.SyslogUDPSource
+org.apache.flume.Source                   --                  org.apache.flume.source.avroLegacy.AvroLegacySource
+org.apache.flume.Source                   --                  org.apache.flume.source.thriftLegacy.ThriftLegacySource
+org.apache.flume.Source                   --                  org.example.MySource
+org.apache.flume.Sink                     NULL                org.apache.flume.sink.NullSink
+org.apache.flume.Sink                     LOGGER              org.apache.flume.sink.LoggerSink
+org.apache.flume.Sink                     AVRO                org.apache.flume.sink.AvroSink
+org.apache.flume.Sink                     HDFS                org.apache.flume.sink.hdfs.HDFSEventSink
+org.apache.flume.Sink                     --                  org.apache.flume.sink.hbase.HBaseSink
+org.apache.flume.Sink                     --                  org.apache.flume.sink.hbase.AsyncHBaseSink
+org.apache.flume.Sink                     FILE_ROLL           org.apache.flume.sink.RollingFileSink
+org.apache.flume.Sink                     IRC                 org.apache.flume.sink.irc.IRCSink
+org.apache.flume.Sink                     --                  org.example.MySink
+org.apache.flume.ChannelSelector          REPLICATING         org.apache.flume.channel.ReplicatingChannelSelector
+org.apache.flume.ChannelSelector          MULTIPLEXING        org.apache.flume.channel.MultiplexingChannelSelector
+org.apache.flume.ChannelSelector          --                  org.example.MyChannelSelector
+org.apache.flume.SinkProcessor            DEFAULT             org.apache.flume.sink.DefaultSinkProcessor
+org.apache.flume.SinkProcessor            FAILOVER            org.apache.flume.sink.FailoverSinkProcessor
+org.apache.flume.SinkProcessor            LOAD_BALANCE        org.apache.flume.sink.LoadBalancingSinkProcessor
+org.apache.flume.interceptor.Interceptor  TIMESTAMP           org.apache.flume.interceptor.TimestampInterceptor$Builder
+org.apache.flume.interceptor.Interceptor  HOST                org.apache.flume.interceptor.HostInterceptor$Builder
+org.apache.flume.interceptor.Interceptor  STATIC              org.apache.flume.interceptor.StaticInterceptor$Builder
+========================================  ==================  ====================================================================



Mime
View raw message