From "Mangtani, Kushal" <Kushal.Mangt...@viasat.com>
Subject RE: FW: Memory Channel gets full.. Avro Sinks cannot drain the events at a fast rate
Date Fri, 02 May 2014 20:46:23 GMT
I have never tried this. Will do in my test env and share the results soon.

Just FYI: I always though there can be only one Sink Instance associated with one channel
at one time;so no two sinks accessing the same channel can be done.However; from the sounds
of your response ; I might be wrong

From: Jeff Lord [mailto:jlord@cloudera.com]
Sent: Friday, May 02, 2014 7:55 AM
To: user@flume.apache.org
Subject: Re: FW: Memory Channel gets full.. Avro Sinks cannot drain the events at a fast rate


Have you considered removing the sinks from the sinkGroup?
This will increase your concurrency for processing channel events by allowing both sinks to
read from the channel simultaneously. With a sink group in place only one sink will read at
a time.

Hope this helps.


On Fri, May 2, 2014 at 2:31 AM, Mangtani, Kushal <Kushal.Mangtani@viasat.com<mailto:Kushal.Mangtani@viasat.com>>


I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
I am running a 2 tier(agent,collector) Flume Configuration with custom plugins. There are
approximately 20 agents (receiving data) and 6 collector flume (writing to HDFS) machines
all running independenly.  However, The channel in the agent is not able to keep up with inputs
events causing the channel to get full and drop events.

Key Points:

1.       Input rate is 2000 events/sec ;Avg size of each event is 2KB.. At peak, we have 4
MB/sec of input traffic

2.       After some debugging, we inferred that sink was not draining events fast enough;

a.       We tried change the Sink from Avro to Thrift

b.      Also, we decided to increase parallelism in channels,sinks of agent process; so we
used ChannelMultiplexing and distributed the traffic across multiple channels instead of one.
However, 2 a) Or 2b) from above did not help.

3.       I have set XMS, Xmx to 1GB, 8 GB respectively

Agent Conf:

# Name the components on this agent
agent.sources = r1
agent.channels = c1
agent.sinks = k1 k2

# Describe/configure the source
agent.sources.r1.type = CustomSource-1
agent.sources.r1.port = 4000
agent.sources.r1.containsVersion = true
agent.sources.r1.channels = c1
agent.sources.r1.interceptors = i1 i2
agent.sources.r1.interceptors.i1.type = CustomInterceptor-1
agent.sources.r1.interceptors.i1.schemaFolder = /usr/lib/flume-ng/schema
agent.sources.r1.interceptors.i1.discardEventsAfterDays = 7
agent.sources.r1.interceptors.i2.type = CustomInterceptor-2
agent.sources.r1.interceptors.i2.schemaFolder = /usr/lib/flume-ng/schema
agent.sources.r1.interceptors.i2.optoutCron = 0 * * * *

# Use a channel which buffers events in memory
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1000000
agent.channels.c1.transactionCapacity = 10000

#Load balancing sink group
agent.sinkgroups = g1
agent.sinkgroups.g1.sinks = k1 k2
agent.sinkgroups.g1.processor.type = load_balance
agent.sinkgroups.g1.processor.backoff = true
agent.sinkgroups.g1.processor.selector = random
agent.sinkgroups.g1.processor.selector.maxTimeOut = 64000

# Describe the sink k1
agent.sinks.k1.type = avro
agent.sinks.k1.channel = c1
agent.sinks.k1.hostname = machine-1
agent.sinks.k1.port = 5300
agent.sinks.k1.batch-size = 10000

# Describe the sink k2
agent.sinks.k2.type = avro
agent.sinks.k2.channel = c1
agent.sinks.k2.hostname = machine-2
agent.sinks.k2.port = 5300
agent.sinks.k2.batch-size = 10000

FYI: I have tried a lot of tweaking across channel.transaction capacity and sink.batch size
; eventually we came up with value of 10,000 for both the conf properties.

1.       Could you tell me how can I increase the downstream rate of channel such that the
Channel never gets full? Ideally, we want a scenario that the Sink is draining events from
the Channel at the same rate to which they are getting Put in the channel?

Your inputs/suggestions will be thoroughly appreciated.

Kushal Mangtani
Software Engineer

