flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Flume + Kafka, Some results.
Date Wed, 04 Mar 2015 16:20:28 GMT
Hello,

We're doing some tests with Kafka-Flume.

We have four kafka and Flumes installed, There are 8 Datanodes
installed in others machines.
We have developed a injector to Kafka and want to read messages with
Flume, we have been trying these configurations:

Injector --> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS
Injector --> Kafka Channel --> Sink HDFS

We start to execute Flume when our injector ends to inject 1M message
of 1024bytes and measure how many messages are processed per second. I
mean, time from reading of kafka until writting them in hdfs.

Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS
A.1 agent, one topic with 4 partitions 1 min 53 sg 8849 msg/sg
B.1 agent, one topic with 8 partitions 1 min 47 sg 9345 sg/sg
C.4 agent, one topic with 4 partitions, one agent for each partition 1
min 12 sg 13888 msg/sg
D.4 agent, one topic with 8 partitions, one agent for every two
partitions 46 sg 21739 msg/sg
E.4 agent, one topic with 12 partitions, one agent for every three
partitions 50 sg 20000 msg/sg

Kafka Channel --> Sink HDFS
F. 1 agent ,One topic with one partition 2 min 50 sg. 5882 msg/sg
G.1 agent, one topic with 4 partitions 3 min 5555 msg/sg
H.4 agents, 4 partitions, one agent for each partition 46 sg 21739
msg/sg Kafka channel, no source
K.4 agents, 8 partitions, one agent for every two partitions 69 sg
14925 msg/sg Kafka channel, no source

I'm confused with H and K,
I guess that the sink is monothread, so, you need to have at least as
many hdfs sinks as partitions in Kafka. That's why H is four times
better than G.
It's weird the different between D and K, Could someone tell me the
reason? Is it the KafkaSource monotheard?

On th other hand, it seems like the number of messages per seconds
it's pretty low. We'll try to tune Flume with a bigger batchSize and
others parameters to improve the performance.. Any advise about it? I
thought as well to try with Null Sink to isolate Flume of HDFS.

Mime
View raw message