cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ipremyadav <ipremya...@gmail.com>
Subject Re: Cassandra - Spark - Flume: best architecture for log analytics.
Date Thu, 23 Jul 2015 07:51:25 GMT
Though DSE cassandra comes with hadoop integration, this is clearly is use case for hadoop.

Any reason why cassandra is your first choice?



> On 23 Jul 2015, at 6:12 a.m., Pierre Devops <pierredevops@gmail.com> wrote:
> 
> Cassandra is not very good at massive read/bulk read if you need to retrieve and compute
a large amount of data on multiple machines using something like spark or hadoop (or you'll
need to hack and process the sstable directly, something which is not "natively" supported,
you'll have to hack your way)
> 
> However, it's very good to store and retrieve them once they have been processed and
sorted. That's why I would opt for solution 2) or for another solution which process data
before inserting them in cassandra, and doesn't use cassandra as a temporary store.
> 
> 2015-07-23 2:04 GMT+02:00 Renato Perini <renato.perini@gmail.com>:
>> Problem: Log analytics.
>> 
>> Solutions:
>>        1) Aggregating logs using Flume and storing the aggregations into Cassandra.
Spark reads data from Cassandra, make some computations
>> and write the results in distinct tables, still in Cassandra.
>>        2) Aggregating logs using Flume to a sink, streaming data directly into Spark.
Spark make some computations and store the results in Cassandra.
>>        3) *** your solution ***
>> 
>> Which is the best workflow for this task?
>> I would like to setup something flexible enough to allow me to use batch processing
and realtime streaming without major fuss.
>> 
>> Thank you in advance.
> 

Mime
View raw message