1 - I would like to generate some statistics and store some raw events from log files tailed with flume. I saw some plugins giving Cassandra sinks but I would like to store data in a custom way, storing raw data but also incrementing counters to get near real-time statistcis. How to do it ? Do I need to build a custom plugin/sink or can I configure an existing sink to write data in a custom way ?
2 - My business process also use my Cassandra DB (without flume, directly via thrift), how to ensure that log writing won't overload my database and introduce latency in my business process ? I mean, is there a way to to manage the throughput sent by the flume's tails and slow them when my Cassandra cluster is overloaded ? I would like to avoid building 2 separated clusters.