Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E4D710178 for ; Sat, 3 Jan 2015 10:48:31 +0000 (UTC) Received: (qmail 63503 invoked by uid 500); 3 Jan 2015 10:48:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 63461 invoked by uid 500); 3 Jan 2015 10:48:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 63451 invoked by uid 99); 3 Jan 2015 10:48:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Jan 2015 10:48:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hugo.pinto@inovaworks.com designates 209.85.212.173 as permitted sender) Received: from [209.85.212.173] (HELO mail-wi0-f173.google.com) (209.85.212.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Jan 2015 10:48:24 +0000 Received: by mail-wi0-f173.google.com with SMTP id r20so691707wiv.12 for ; Sat, 03 Jan 2015 02:47:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inovaworks.com; s=google; h=mime-version:from:date:message-id:subject:to:content-type; bh=b4VSg+cImCrj+LMFQPDS02tKAbfkir75/BvRQKNz7io=; b=XU7ReGeW0QgGZ1zXImv/7butp3BfpfEnogfJ0iMmCKQ++6Qlr8HELt8ng6NBFYqcCI XWXc3O+NjZMGor1fsTA6G4Sww5+UA1PgrmvDFK/bIYuoFdfQ/9vyKHLFL3SveiZHa4Ag vZTmYDYinbf26lelO2F6kmxb8H0Aa4mRL2McU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=b4VSg+cImCrj+LMFQPDS02tKAbfkir75/BvRQKNz7io=; b=PmlRPjwC2+XtP6A9uyWhQcA26oQlingf1XXaXaTSBf7A+QPcnq/AD37gagN2SSKpup y5Uf2ZRp5qZ3SN6MpQ4degtGuOxxQUoHO/u+t5oUdUkydsf5SNIS/RMxSctqML2iIxQS /OM09R56qTtV5siB8PTpwRtXqw0SSZEBVaph6KnBBloIrN3T0XmjTMOUXY30NF4H3/Mt TntstQMHqSapBqkhPxDT/itDAKBXb0KdU0a0AmDxpqZZuOhq2NCS2lUHSy5kyiEb4OGn kie0VH5PZk1hpVauFsep57DdSa1VoHjj0KbcbA+exUgV51E/T/DQsO7vvKxGCR7EtnHT hVrw== X-Gm-Message-State: ALoCoQmn151YPr22PpZgZUHrHc+h5Ffb4n7KJ/oL2H+qORDc1HUt/S2eJ9pV85tsA0bCRTVIesD5 X-Received: by 10.194.104.196 with SMTP id gg4mr40394661wjb.31.1420282038883; Sat, 03 Jan 2015 02:47:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.27.179.79 with HTTP; Sat, 3 Jan 2015 02:46:58 -0800 (PST) From: =?UTF-8?Q?Hugo_Jos=C3=A9_Pinto?= Date: Sat, 3 Jan 2015 10:46:58 +0000 Message-ID: Subject: Best approach in Cassandra (+ Spark?) for Continuous Queries? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e010d83fc68b264050bbd2eec X-Virus-Checked: Checked by ClamAV on apache.org --089e010d83fc68b264050bbd2eec Content-Type: text/plain; charset=UTF-8 Hello. We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics. Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows. This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort? I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think "all changes to Ship positions on California's coastline"), and iterating out of the store would kill the streamer's scalability. Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this? Many thanks for any assistance! Hugo --089e010d83fc68b264050bbd2eec Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

He= llo.

We're currently using Hazelcast (http= ://hazelcast.org/) as a = distributed in-memory data grid. That's been working sort-of-well for u= s, but going solely in-memory has exhausted its path in our use case, and w= e're considering porting our application to a NoSQL persistent store. A= fter the usual comparisons and evaluations, we're borderline close to p= icking Cassandra, plus eventually Spark for analytics.

Nonetheless, there is a g= ap in our architectural needs that we're still not grasping how to solv= e in Cassandra (with or without Spark): Hazelcast allows us to create a Con= tinuous Query in that, whenever a row is added/removed/modified from the cl= ause's resultset, Hazelcast calls up back with the corresponding notifi= cation. We use this to continuously update the clients via AJAX streaming w= ith the new/changed rows.

This is probably a conceptual mismatch we're making, so - how= to best address this use case in Cassandra (with or without Spark's he= lp)? Is there something in the API that allows for Continuous Queries on ke= y/clause changes (haven't found it)? Is there some other way to get a s= tream of key/clause updates? Events of some sort?

I'm aware that we could, eventually, = periodically poll Cassandra, but in our use case, the client is potentially= interested in a large number of table clause notifications (think "al= l changes to Ship positions on California's coastline"), and itera= ting out of the store would kill the streamer's scalability.

Hence, the magic question:= what are we missing? Is Cassandra the wrong tool for the job? Are we not a= ware of a particular part of the API or external library in/outside the apa= che realm that would allow for this?

Many thanks for any assistance!

Hugo

--089e010d83fc68b264050bbd2eec--