flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Javi Roman <javiro...@redoop.org>
Subject Re: Collecting thousands of sources
Date Fri, 05 Sep 2014 07:14:03 GMT

The scenario which is describing Juanfra is related with the question
I made a few days ago [1].

You can not install Flume agents in the SNMP managed devices, and you
can not modify any software in the SNMP managed devide for use Flume
client SDK (if I understand correctly your idea Ashish). There are two
ways for SNMP data collection from SNMP devices using Flume (IMHO):

1. To create a custom application which launches the SNMP queries to
the thousand of devices, and log the answer into a file: In this case
Flume can sniff this file with the "exec source" core plugin (tail).

2. To use a flume-snmp-source plugin (similar to [2]), in other words,
to shift the SNMP query custom application into a specialized Flume

Juanfra is talking about a scenario like the second point. In that
case you have to handle a huge flume configuration file, with an entry
for each managed device to query. For this situation I guess there are
two possible solutions:

1. The flume-snmp-source plugin can use a file with a list of host to
query as parameter:

agent.sources.source1.host = /path/to/list-of-host-file

However I guess this breaks the philosophy or the simplicity of other
core plugins of Flume.

2.  Create a little program to fill the flume configuration file with
a template, or something similar.

Any other ideas? I guess this is a good discussion about a real world use case.

[1] http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser
[2] https://github.com/javiroman/flume-snmp-source

On Fri, Sep 5, 2014 at 4:56 AM, Ashish <paliwalashish@gmail.com> wrote:
> Have a look at Flume Client SDK. One simple way would be to use Flume clients implementations
to send Events to Flume Sources, this would significantly reduce the number of Sources you
need to manage.
> HTH !
> On Thu, Sep 4, 2014 at 9:40 PM, JuanFra Rodriguez Cardoso <juanfra.rodriguez.cardoso@gmail.com>
>> Thanks Andrew for your quick response.
>> My sources (server PUD) can't put events into an agregation point. For this reason
I'm following a PollingSource schema where my agent needs to be configured with thousands
of sources. Any clues for use cases where data is injected considering a polling process?
>> Regards!
>> ---
>> JuanFra Rodriguez Cardoso
>> 2014-09-04 17:41 GMT+02:00 Andrew Ehrlich <andrew@aehrlich.com>:
>>> One way to avoid managing so many sources would be to have an aggregation point
between the data generators the flume sources. For example, maybe you could have the data
generators put events into a message queue(s), then have flume consume from there?
>>> Andrew
>>> ---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez Cardoso<juanfra.rodriguez.cardoso@gmail.com>
wrote ----
>>> Hi all:
>>> Considering an environment with thousands of sources, which are the best practices
for managing the agent configuration (flume.conf)? Is it recommended to create a multi-layer
topology where each agent takes control of a subset of sources?
>>> In that case, a conf mgmg server (such as Puppet) would be responsible for editing
flume.conf  with parameters 'agent.sources' from source1 to source3000 (assuming we have 3000
sources machines).
>>> Are my thoughts aligned with that scenarios of large scale data ingest?
>>> Thanks a lot!
>>> ---
>>> JuanFra
> --
> thanks
> ashish
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal

View raw message