Hi Oleg,

I am the creator of Calliope. Calliope doesn't force any deployment model... that means you can run it with Mesos or Hadoop or Standalone. To be fair I don't think the other libs mentioned here should work too.

The Spark cluster HA can be provided using ZooKeeper even in the standalone deployment mode. 

Can you explain what do you mean by "in memory aggregations" not being possible. With Calliope being able to utilize the secondary indexes and also our Stargate Indexes (Distributed lucene indexing for C*)  I am sure we can handle any scenario. Calliope is used in production at many large organizations over very very big data.

Feel free to mail me directly, and we can work with you to get you started.


Founder & CEO, Tuplejump, Inc.
The Data Engineering Platform

On Thu, Sep 11, 2014 at 8:09 PM, Oleg Ruchovets <oruchovets@gmail.com> wrote:
   DataStax , Startio are required mesos, hadoop yarn other third party to get spark cluster HA.

What in case of calliope?
Is it sufficient to have cassandra + calliope + spark to be able process aggregations?
In my case we have quite a lot of data so doing aggregation only in memory - impossible.

Does calliope support not in memory mode for spark?


On Thu, Sep 11, 2014 at 9:23 PM, abhinav chowdary <abhinav.chowdary@gmail.com> wrote:
Adding to conversation...

there are 3 great open source options available

1. Calliope http://tuplejump.github.io/calliope/
    This is the first library that was out some time late last year (as i can recall) and I have been using this for a while, mostly very stable, uses Hadoop i/o in Cassandra (note that it doesn't require hadoop)

2. Datastax spark cassandra connector https://github.com/datastax/spark-cassandra-connector: Main difference is this uses cql3, again a great library but has few issues, also is very actively developed by far and still uses thrift for minor stuff but all heavy lifting in cql3

3. Startio Deep https://github.com/Stratio/stratio-deep: Has lot more to offer if you use all startio stack, Deep is for Spark, Statio Streaming is built on top of spark streaming, Stratio meta is something similar to sharkor sparksql and finally stratio Cassandra which is a fork of Cassandra with advanced Lucene based indexing