flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thanh Hong Dai" <hdth...@tma.com.vn>
Subject Is it a good idea to use Flume Interceptor to process data?
Date Wed, 27 Jul 2016 11:39:43 GMT


To give some background: We are currently buffering monitoring data into
Kafka, where each message in Kafka records several metrics at a point in

For each of the record, we need to perform some calculation based on the
metrics in the record, append the results (multiple of them) to the record
and send the resulting record into a data store (let's call it DS1). All
data required for the calculation are encapsulated in the record,
essentially making this an embarrassingly parallel problem.

The formula for the calculation is stored in a different data store (let's
call it DS2), and can be changed (add/delete/modified by user). We are not
required to react to the change immediately, but we should do so in
reasonable time (e.g. 5 minutes).


Currently, we have prototyped an implementation which implements the data
processing as described above in an Interceptor. We define the source as
Kafka, the Sink as the sink for DS2, and we attach the Interceptor to the
channel. As described above, the Interceptor will be reading the formula
from DS1 regularly for any change, and will be responsible for processing
the data as they come in from Kafka.


We are aware of other streaming processing frameworks such as Spark of
Kafka. However, the implementation above is motivated by the fact that Flume
has provided reliable streaming, and we want to reuse as much code as


Is this usage of Flume a good idea in term of performance and scalability?


Best regards,

Hong Dai Thanh.

View raw message