flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron Gandevia <cgande...@gmail.com>
Subject FlumeNG Performance Questions
Date Wed, 07 Nov 2012 02:10:33 GMT
Hi

I am trying to transition some flume nodes running FlumeOG to FlumeNG but
am running into a few difficulties. We are writing around 16,000 events/s
from a bunch of FlumeOG agents to a FlumeNG agent but we can't seem to get
the FlumeNG agent to drain the memory channel fast enough. At first I
thought maybe we were reaching the limit of a single Flume agent but I get
similar performance using a file channel which doesn't make sense.

I have tried configuring anywhere from a single hdfs sink up to twenty of
them, I have also tried changing the batch sizes from 1000 up to 100,000
but no matter what I do the channel fills fairly quickly.

I am running a single flow using the below configuration

${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.type = memory
${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.capacity = 1000000
${FLUME_COLLECTOR_ID}.channels.hdfs-memoryChannel.transactionCapacity =
100000

${FLUME_COLLECTOR_ID}.sources.perf_legacysource.type =
org.apache.flume.source.thriftLegacy.ThriftLegacySource
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.host = 0.0.0.0
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.port = 36892
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.channels =
hdfs-memoryChannel
${FLUME_COLLECTOR_ID}.sources.perf_legacysource.selector.type = replicating

${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.type = hdfs
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.path =
hdfs://${HADOOP_NAMENODE}:8020/rawLogs/%Y-%m-%d/%H00
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.codeC =
com.hadoop.compression.lzo.LzopCodec
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.fileType = CompressedStream
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollInterval = 300
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollSize = 0
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.rollCount = 0
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.batchSize = 50000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.callTimeout = 120000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.hdfs.filePrefix =
${FLUME_COLLECTOR_ID}_1
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.txnEventMax = 1000
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.serializer = text
${FLUME_COLLECTOR_ID}.sinks.hdfs-sink.channel = hdfs-memoryChannel

Thanks

Cameron Gandevia

Mime
View raw message