cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: Best approach in Cassandra (+ Spark?) for Continuous Queries?
Date Sat, 03 Jan 2015 11:58:28 GMT

It looks like you're using the wrong tool and architecture.

If the use case really needs continuous query like event processing, use an ESP product to
do that. You can still store data in Cassandra for persistence .

The design you want is to have two paths: event stream and persistence. At the entry point,
the system makes parallel calls. One goes to a messaging system that feeds the ESP and a second
that calls Cassandra 


Sent from my iPhone

> On Jan 3, 2015, at 5:46 AM, Hugo José Pinto <hugo.pinto@inovaworks.com> wrote:
> 
> Hello.
> 
> We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data
grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its
path in our use case, and we're considering porting our application to a NoSQL persistent
store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra,
plus eventually Spark for analytics.
> 
> Nonetheless, there is a gap in our architectural needs that we're still not grasping
how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous
Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast
calls up back with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.
> 
> This is probably a conceptual mismatch we're making, so - how to best address this use
case in Cassandra (with or without Spark's help)? Is there something in the API that allows
for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to
get a stream of key/clause updates? Events of some sort?
> 
> I'm aware that we could, eventually, periodically poll Cassandra, but in our use case,
the client is potentially interested in a large number of table clause notifications (think
"all changes to Ship positions on California's coastline"), and iterating out of the store
would kill the streamer's scalability.
> 
> Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job?
Are we not aware of a particular part of the API or external library in/outside the apache
realm that would allow for this?
> 
> Many thanks for any assistance!
> 
> Hugo

Mime
View raw message