flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Re: Flume + Kafka, Some results.
Date Thu, 05 Mar 2015 08:32:11 GMT
Which one could the reason be of the results between
C.4 agent, one topic with 4 partitions, one agent for each partition 1
min 12 sg 13888 msg/sg
D.4 agent, one topic with 8 partitions, one agent for every two
partitions 46 sg 21739 msg/sg

If there's just one thread, why does the performance improve?
The reason to have more than one agent flume was that if each agent
flume installed in the machine A and a Kafka installed in the same
machine, AgentA is who reads from the partitions of KafkaA, so there
aren't any transmission of data for the networks in that step.. I
guess that isn't possible for Flume to know that.
What I saw it's that if I have 10 partitions, each time I execute a
new agent, the partitions are distributed between all the agents to
balance the load.

2015-03-04 17:59 GMT+01:00 Hari Shreedharan <hshreedharan@cloudera.com>:
> Sinks are single threaded. If you have more threads your performance will
> improve. And you are right in the sense that if you want to test the Kafka
> components then you should use null sink.
>
> Also note that all your sinks can be one the same agent, you don't need
> several agents just to have multiple sinks. Just have them configured to use
> the same channel.
>
> Thanks,
> Hari
>
>
> On Wed, Mar 4, 2015 at 8:20 AM, Guillermo Ortiz <konstt2000@gmail.com>
> wrote:
>>
>> Hello,
>>
>> We're doing some tests with Kafka-Flume.
>>
>> We have four kafka and Flumes installed, There are 8 Datanodes
>> installed in others machines.
>> We have developed a injector to Kafka and want to read messages with
>> Flume, we have been trying these configurations:
>>
>> Injector --> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS
>> Injector --> Kafka Channel --> Sink HDFS
>>
>> We start to execute Flume when our injector ends to inject 1M message
>> of 1024bytes and measure how many messages are processed per second. I
>> mean, time from reading of kafka until writting them in hdfs.
>>
>> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS
>> A.1 agent, one topic with 4 partitions 1 min 53 sg 8849 msg/sg
>> B.1 agent, one topic with 8 partitions 1 min 47 sg 9345 sg/sg
>> C.4 agent, one topic with 4 partitions, one agent for each partition 1
>> min 12 sg 13888 msg/sg
>> D.4 agent, one topic with 8 partitions, one agent for every two
>> partitions 46 sg 21739 msg/sg
>> E.4 agent, one topic with 12 partitions, one agent for every three
>> partitions 50 sg 20000 msg/sg
>>
>> Kafka Channel --> Sink HDFS
>> F. 1 agent ,One topic with one partition 2 min 50 sg. 5882 msg/sg
>> G.1 agent, one topic with 4 partitions 3 min 5555 msg/sg
>> H.4 agents, 4 partitions, one agent for each partition 46 sg 21739
>> msg/sg Kafka channel, no source
>> K.4 agents, 8 partitions, one agent for every two partitions 69 sg
>> 14925 msg/sg Kafka channel, no source
>>
>> I'm confused with H and K,
>> I guess that the sink is monothread, so, you need to have at least as
>> many hdfs sinks as partitions in Kafka. That's why H is four times
>> better than G.
>> It's weird the different between D and K, Could someone tell me the
>> reason? Is it the KafkaSource monotheard?
>>
>> On th other hand, it seems like the number of messages per seconds
>> it's pretty low. We'll try to tune Flume with a bigger batchSize and
>> others parameters to improve the performance.. Any advise about it? I
>> thought as well to try with Null Sink to isolate Flume of HDFS.
>
>

Mime
View raw message