Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of hugo.pinto@inovaworks.com
 designates 209.85.212.173 as permitted sender)
MIME-Version: 1.0
From: =?UTF-8?Q?Hugo_Jos=C3=A9_Pinto?= <hugo.pinto@inovaworks.com>
Date: Sat, 3 Jan 2015 10:46:58 +0000
Message-ID: 
 <CAF3wBe0ep8TF6xxcPiqxAwzuqEiysiHCV6RM4NJDF4XqSC_T9w@mail.gmail.com>
Subject: Best approach in Cassandra (+ Spark?) for Continuous Queries?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e010d83fc68b264050bbd2eec

--089e010d83fc68b264050bbd2eec
Content-Type: text/plain; charset=UTF-8

Hello.

We're currently using Hazelcast (http://hazelcast.org/) as a distributed
in-memory data grid. That's been working sort-of-well for us, but going
solely in-memory has exhausted its path in our use case, and we're
considering porting our application to a NoSQL persistent store. After the
usual comparisons and evaluations, we're borderline close to picking
Cassandra, plus eventually Spark for analytics.

Nonetheless, there is a gap in our architectural needs that we're still not
grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.

This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?

I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.

Hence, the magic question: what are we missing? Is Cassandra the wrong tool
for the job? Are we not aware of a particular part of the API or external
library in/outside the apache realm that would allow for this?

Many thanks for any assistance!

Hugo

--089e010d83fc68b264050bbd2eec
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font=
-size:14px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:=
Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-heigh=
t:17.8048000335693px;background-image:initial;background-repeat:initial">He=
llo.</p><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:14p=
x;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,=
9;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.8048=
000335693px;background-image:initial;background-repeat:initial"><span style=
=3D"line-height:17.8048000335693px">We&#39;re currently using Hazelcast (</=
span><a href=3D"http://hazelcast.org/" rel=3D"nofollow" style=3D"line-heigh=
t:17.8048000335693px;margin:0px;padding:0px;border:0px;vertical-align:basel=
ine;color:rgb(74,107,130);text-decoration:none;background:transparent">http=
://hazelcast.org/</a><span style=3D"line-height:17.8048000335693px">) as a =
distributed in-memory data grid. That&#39;s been working sort-of-well for u=
s, but going solely in-memory has exhausted its path in our use case, and w=
e&#39;re considering porting our application to a NoSQL persistent store. A=
fter the usual comparisons and evaluations, we&#39;re borderline close to p=
icking Cassandra, plus eventually Spark for analytics.</span><br></p><p sty=
le=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:14px;vertical-ali=
gn:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation S=
ans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.8048000335693px;ba=
ckground-image:initial;background-repeat:initial">Nonetheless, there is a g=
ap in our architectural needs that we&#39;re still not grasping how to solv=
e in Cassandra (with or without Spark): Hazelcast allows us to create a Con=
tinuous Query in that, whenever a row is added/removed/modified from the cl=
ause&#39;s resultset, Hazelcast calls up back with the corresponding notifi=
cation. We use this to continuously update the clients via AJAX streaming w=
ith the new/changed rows.</p><p style=3D"margin:0px 0px 1em;padding:0px;bor=
der:0px;font-size:14px;vertical-align:baseline;clear:both;color:rgb(0,0,0);=
font-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-seri=
f;line-height:17.8048000335693px;background-image:initial;background-repeat=
:initial">This is probably a conceptual mismatch we&#39;re making, so - how=
 to best address this use case in Cassandra (with or without Spark&#39;s he=
lp)? Is there something in the API that allows for Continuous Queries on ke=
y/clause changes (haven&#39;t found it)? Is there some other way to get a s=
tream of key/clause updates? Events of some sort?</p><p style=3D"margin:0px=
 0px 1em;padding:0px;border:0px;font-size:14px;vertical-align:baseline;clea=
r:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;De=
jaVu Sans&#39;,sans-serif;line-height:17.8048000335693px;background-image:i=
nitial;background-repeat:initial">I&#39;m aware that we could, eventually, =
periodically poll Cassandra, but in our use case, the client is potentially=
 interested in a large number of table clause notifications (think &quot;al=
l changes to Ship positions on California&#39;s coastline&quot;), and itera=
ting out of the store would kill the streamer&#39;s scalability.</p><p styl=
e=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:14px;vertical-alig=
n:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sa=
ns&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.8048000335693px;bac=
kground-image:initial;background-repeat:initial">Hence, the magic question:=
 what are we missing? Is Cassandra the wrong tool for the job? Are we not a=
ware of a particular part of the API or external library in/outside the apa=
che realm that would allow for this?</p><p style=3D"margin:0px 0px 1em;padd=
ing:0px;border:0px;font-size:14px;vertical-align:baseline;clear:both;color:=
rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39=
;,sans-serif;line-height:17.8048000335693px;background-image:initial;backgr=
ound-repeat:initial">Many thanks for any assistance!</p><p style=3D"margin:=
0px 0px 1em;padding:0px;border:0px;font-size:14px;vertical-align:baseline;c=
lear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39=
;DejaVu Sans&#39;,sans-serif;line-height:17.8048000335693px;background-imag=
e:initial;background-repeat:initial">Hugo</p>
</div>

--089e010d83fc68b264050bbd2eec--