flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthikeyan Muthukumarasamy <mkarthik.h...@gmail.com>
Subject Re: Flume latency issue
Date Fri, 28 Sep 2012 04:23:17 GMT
Hi Mike,
Thanks for the response!
Im use flume-ng-1.2.0 version.

In the prototype that Im building, the final consolidated sink writes to a
file. I intend to extend this with more specific sinks like HBase, JMX etc.
While Im writing this mail, I get a doubt if the latency is caused by some
buffering happening at the final FILE_ROLL sink!
I have test scripts loading messages with timestamp every one second to the
files hblog, zklog & applog.
I expect the final consolidated sink's output to be something like this:
10:00:00 hblaselog entry
10:00:00 zklog entry
10:00:00 applog entry
10:00:01 hblaselog entry
10:00:01 zklog entry
10:00:01 applog entry
10:00:02 hblaselog entry
10:00:02 zklog entry
10:00:02 applog entry
combos like above...

But in reality, some batching seems to be occuring in between and once
every 10 secs, I see that the consolidated sink writes chunks (from each
src) to the output file as follows:
10:00:00 hblaselog entry
10:00:01 hblaselog entry
10:00:02 hblaselog entry
(17 more like this)
10:00:00 zklog entry
10:00:01 zklog entry
10:00:02 zklog entry
(17 more like this)
10:00:00 applog entry
10:00:01 applog entry
10:00:02 applog entry

My Flume conf file is as below:
# example.conf: A single-node Flume configuration

# Name the components on this agent
agent1.sources = hbase-src zk-src app-src consolidated-src
agent1.sinks = hbase-sink zk-sink app-sink consolidated-sink
agent1.channels = hbase-chn zk-chn app-chn consolidated-chn

# All channels are in-memory channels
agent1.channels.hbase-chn.type = memory
agent1.channels.zk-chn.type = memory
agent1.channels.app-chn.type = memory
agent1.channels.consolidated-chn.type = memory

# Describe/configure hbase-src
agent1.sources.hbase-src.type = exec
agent1.sources.hbase-src.command = tail -F /home/efhjlns/scripts/hblog
agent1.sources.hbase-src.channels = hbase-chn

# Describe avro hbase-sink
agent1.sinks.hbase-sink.type = avro
agent1.sinks.hbase-sink.hostname = localhost
agent1.sinks.hbase-sink.port = 15001
agent1.sinks.hbase-sink.channel = hbase-chn

# Describe/configure zk-src
agent1.sources.zk-src.type = exec
agent1.sources.zk-src.command = tail -F /home/efhjlns/scripts/zklog
agent1.sources.zk-src.channels = zk-chn

# Describe avro zk-sink
agent1.sinks.zk-sink.type = avro
agent1.sinks.zk-sink.hostname = localhost
agent1.sinks.zk-sink.port = 15001
agent1.sinks.zk-sink.channel = zk-chn

# Describe/configure app-src
agent1.sources.app-src.type = exec
agent1.sources.app-src.command = tail -F /home/efhjlns/scripts/applog
agent1.sources.app-src.channels = app-chn

# Describe avro app-sink
agent1.sinks.app-sink.type = avro
agent1.sinks.app-sink.hostname = localhost
agent1.sinks.app-sink.port = 15001
agent1.sinks.app-sink.channel = app-chn

# Describe/configure consolidated-src
agent1.sources.consolidated-src.type = avro
agent1.sources.consolidated-src.bind = localhost
agent1.sources.consolidated-src.port = 15001
agent1.sources.consolidated-src.channels = consolidated-chn

# Describe consolidated file sink

agent1.sinks.consolidated-sink.type = FILE_ROLL
agent1.sinks.consolidated-sink.sink.directory = /home/efhjlns/flume-opt-dir
agent1.sinks.consolidated-sink.sink.rollInterval = 0
agent1.sinks.consolidated-sink.channel = consolidated-chn

Thanks & Regards

On Fri, Sep 28, 2012 at 2:38 AM, Mike Percy <mpercy@apache.org> wrote:

> MK,
> Which version of Flume are you using?
> This design sounds reasonable to me.
> Regarding your question about 20 second latency, that should not be true.
> Can you please confirm that you have observed this? If you have, please
> attach your flume.conf and we can take a look. But it should be nearly
> instantaneous - the batch sizes are controlled by the client or sink, and
> in general the sink takes as much as it can from the channel, up to the
> batch size, but if it takes less we continue immediately regardless. The
> only time we "back off" is when the channel is empty *and* no events were
> taken in the current batch - then the sink runner goes into an exponential
> backoff.
> Regards,
> Mike
> On Thu, Sep 27, 2012 at 6:55 AM, Karthikeyan Muthukumarasamy <
> mkarthik.here@gmail.com> wrote:
>> Hi,
>> In my project various applications and 3PPs write log into to their
>> separate logfiles.
>> There are two limitations with this:
>> - the structure of the log messages are different in each log file
>> - the log messages are in different files and I cant get a single time
>> sorted display of all log messages, which is important in some debug
>> situations
>> As a solution to this problem, I intend to:
>> - use separate flume sources to tail various log files in the system
>> - have interceptors for each type of flume source and convert all log
>> messages to a common structure
>> - all flume sinks will write to a localhost avro port
>> - a separate flume source will read from the avro port on localhost
>> - there will be a fan-out logic to post the data from that source to
>> multiple channels
>> - each connel is connected to a separate sink like JMX sink, HBase Sink
>> etc
>> First of all, is this kind of usage of flume acceptable and is there
>> anything I need to specifically take care of?
>> I also notice that the consolidated avro source which reads data from
>> avro port gets data only as blocks from each source, the latency is around
>> 20 seconds. Is it possible to reduce this latency, so at the consolidated
>> avro source, I receive all events as they are getting logged into their log
>> files, instantaneously?
>> Thanks in advance!
>> MK

View raw message