flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Lord <jl...@cloudera.com>
Subject Re: FW: Memory Channel gets full.. Avro Sinks cannot drain the events at a fast rate
Date Fri, 02 May 2014 14:54:46 GMT
Kushal,

Have you considered removing the sinks from the sinkGroup?
This will increase your concurrency for processing channel events by
allowing both sinks to read from the channel simultaneously. With a sink
group in place only one sink will read at a time.

Hope this helps.

-Jeff


On Fri, May 2, 2014 at 2:31 AM, Mangtani, Kushal <Kushal.Mangtani@viasat.com
> wrote:

>
>
> Hi,
>
>
>
> I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
>
> I am running a 2 tier(agent,collector) Flume Configuration with custom
> plugins. There are approximately 20 agents (receiving data) and 6 collector
> flume (writing to HDFS) machines all running independenly.  However, The
> channel in the agent is not able to keep up with inputs events causing the
> channel to get full and drop events.
>
>
>
> Key Points:
>
>
>
> 1.       Input rate is 2000 events/sec ;Avg size of each event is 2KB..
> At peak, we have 4 MB/sec of input traffic
>
> 2.       After some debugging, we inferred that sink was not draining
> events fast enough; so
>
> a.       We tried change the Sink from Avro to Thrift
>
> b.      Also, we decided to increase parallelism in channels,sinks of
> agent process; so we used ChannelMultiplexing and distributed the traffic
> across multiple channels instead of one.
>
> However, 2 a) Or 2b) from above did not help.
>
>
>
> 3.       I have set XMS, Xmx to 1GB, 8 GB respectively
>
>
>
>
>
> Agent Conf:
>
>
>
> # Name the components on this agent
>
> agent.sources = r1
>
> agent.channels = c1
>
> agent.sinks = k1 k2
>
>
>
> # Describe/configure the source
>
> agent.sources.r1.type = CustomSource-1
>
> agent.sources.r1.port = 4000
>
> agent.sources.r1.containsVersion = true
>
> agent.sources.r1.channels = c1
>
> agent.sources.r1.interceptors = i1 i2
>
> agent.sources.r1.interceptors.i1.type = CustomInterceptor-1
>
> agent.sources.r1.interceptors.i1.schemaFolder = /usr/lib/flume-ng/schema
>
> agent.sources.r1.interceptors.i1.discardEventsAfterDays = 7
>
> agent.sources.r1.interceptors.i2.type = CustomInterceptor-2
>
> agent.sources.r1.interceptors.i2.schemaFolder = /usr/lib/flume-ng/schema
>
> agent.sources.r1.interceptors.i2.optoutCron = 0 * * * *
>
>
>
> # Use a channel which buffers events in memory
>
> agent.channels.c1.type = memory
>
> agent.channels.c1.capacity = 1000000
>
> agent.channels.c1.transactionCapacity = 10000
>
>
>
>
>
> #Load balancing sink group
>
> agent.sinkgroups = g1
>
> agent.sinkgroups.g1.sinks = k1 k2
>
> agent.sinkgroups.g1.processor.type = load_balance
>
> agent.sinkgroups.g1.processor.backoff = true
>
> agent.sinkgroups.g1.processor.selector = random
>
> agent.sinkgroups.g1.processor.selector.maxTimeOut = 64000
>
>
>
> # Describe the sink k1
>
> agent.sinks.k1.type = avro
>
> agent.sinks.k1.channel = c1
>
> agent.sinks.k1.hostname = machine-1
>
> agent.sinks.k1.port = 5300
>
> agent.sinks.k1.batch-size = 10000
>
>
>
> # Describe the sink k2
>
> agent.sinks.k2.type = avro
>
> agent.sinks.k2.channel = c1
>
> agent.sinks.k2.hostname = machine-2
>
> agent.sinks.k2.port = 5300
>
> agent.sinks.k2.batch-size = 10000
>
>
>
> FYI: I have tried a lot of tweaking across channel.transaction capacity
> and sink.batch size ; eventually we came up with value of 10,000 for both
> the conf properties.
>
> Questions:
>
> 1.       Could you tell me how can I increase the downstream rate of
> channel such that the Channel never gets full? Ideally, we want a scenario
> that the Sink is draining events from the Channel at the same rate to which
> they are getting Put in the channel?
>
>
>
> Your inputs/suggestions will be thoroughly appreciated.
>
>
>
>
>
> Regards
>
> Kushal Mangtani
>
> Software Engineer
>
>
>
>
>

Mime
View raw message