flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bessenyei Balázs Donát <bes...@apache.org>
Subject Re: Understand JMS source + HDFS sink batch management
Date Wed, 16 Nov 2016 13:44:38 GMT
Hi Roberto,

Do you happen to have any information about the messages themselves?
(Looking at https://flume.apache.org/FlumeUserGuide.html#hdfs-sink ,
there is the hdfs.rollSize setting, that might also be interesting.)

Have you tried using different setup for the source and channel? (It's
easy to try different things with the netcat source for example:
message frequency and size are easily simulated.)


Thank you,

Donat


2016-11-16 11:35 GMT+01:00 Roberto Coluccio <roberto.coluccio@eng.it>:
> Hello folks,
>
> I'm testing a Flume agent defined by a topology made of :
>
> JMS source (Tibco implementation) -> memory channel -> hdfs sink
>
> The JMS source has:
>
> my_agent.sources.my_source.batchSize = 100
>
> The memory channel has:
>
> my_agent.channels.my_channel.capacity = 100
>
> The HDFS sink has:
>
> my_agent.sinks.my_sink.hdfs.batchSize = 100
> my_agent.sinks.my_sink.hdfs.rollCount = 0
> my_agent.sinks.my_sink.hdfs.rollInterval = 0
> my_agent.sinks.my_sink.hdfs.idleTimeout = 0
>
> I don't understand how/why new files on HDFS are created/closed. In fact,
> when I:
>
> launch the agent (JMS queue empty)
> push a new text message on the JMS queue
>
> It happens that a new file is created by the HDFS, but not yet closed (as I
> expect). BUT, when I
>
>     3. push again a new text message on the JMS queue
>
> regardles how much time I waited to perform step 3, the HDFS sink closes the
> previously open file, then open a new one for the new incoming message
> consumed from the queue and processed through the channel.
>
> This way, files will always have 1 and only 1 message inside them. I was
> expecting that number to be 100, according to the configuration mentioned
> above.
>
> Any hints?
>
> Best regards,
>
> Roberto
>
>
>
>
>
>
>

Mime
View raw message