incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Flume and Cassandra
Date Fri, 10 Feb 2012 09:35:03 GMT
> How to do it ? Do I need to build a custom plugin/sink or can I configure an existing
sink to write data in a custom way ?
This is a good starting point https://github.com/thobbs/flume-cassandra-plugin

> 2 - My business process also use my Cassandra DB (without flume, directly via thrift),
how to ensure that log writing won't overload my database and introduce latency in my business
process ?
Anytime you have a data stream you don't control it's a good idea to put some sort of buffer
in there between the outside world and the database. Flume has a buffered sync, I think your
can subclass it and aggregate the counters for a minute or two http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_decorator_semantics

Hope that helps. 
A
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:

> Hi,
> 
> 1 - I would like to generate some statistics and store some raw events from log files
tailed with flume. I saw some plugins giving Cassandra sinks but I would like to store data
in a custom way, storing raw data but also incrementing counters to get near real-time statistcis.
How to do it ? Do I need to build a custom plugin/sink or can I configure an existing sink
to write data in a custom way ?
> 
> 2 - My business process also use my Cassandra DB (without flume, directly via thrift),
how to ensure that log writing won't overload my database and introduce latency in my business
process ? I mean, is there a way to to manage the throughput sent by the flume's tails and
slow them when my Cassandra cluster is overloaded ? I would like to avoid building 2 separated
clusters.
> 
> Thank you,
> 
> Alain
> 


Mime
View raw message