cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Rai <ro...@tuplejump.com>
Subject Re: cassandra + spark / pyspark
Date Thu, 11 Sep 2014 14:51:37 GMT
Hi Oleg,

I am the creator of Calliope. Calliope doesn't force any deployment
model... that means you can run it with Mesos or Hadoop or Standalone. To
be fair I don't think the other libs mentioned here should work too.

The Spark cluster HA can be provided using ZooKeeper even in the standalone
deployment mode.


Can you explain what do you mean by "in memory aggregations" not being
possible. With Calliope being able to utilize the secondary indexes and
also our Stargate Indexes (Distributed lucene indexing for C*)  I am sure
we can handle any scenario. Calliope is used in production at many large
organizations over very very big data.

Feel free to mail me directly, and we can work with you to get you started.

Regards,
Rohit


*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*

On Thu, Sep 11, 2014 at 8:09 PM, Oleg Ruchovets <oruchovets@gmail.com>
wrote:

> Ok.
>    DataStax , Startio are required mesos, hadoop yarn other third party to
> get spark cluster HA.
>
> What in case of calliope?
> Is it sufficient to have cassandra + calliope + spark to be able process
> aggregations?
> In my case we have quite a lot of data so doing aggregation only in memory
> - impossible.
>
> Does calliope support not in memory mode for spark?
>
> Thanks
> Oleg.
>
> On Thu, Sep 11, 2014 at 9:23 PM, abhinav chowdary <
> abhinav.chowdary@gmail.com> wrote:
>
>> Adding to conversation...
>>
>> there are 3 great open source options available
>>
>> 1. Calliope http://tuplejump.github.io/calliope/
>>     This is the first library that was out some time late last year (as i
>> can recall) and I have been using this for a while, mostly very stable,
>> uses Hadoop i/o in Cassandra (note that it doesn't require hadoop)
>>
>> 2. Datastax spark cassandra connector
>> https://github.com/datastax/spark-cassandra-connector: Main difference
>> is this uses cql3, again a great library but has few issues, also is very
>> actively developed by far and still uses thrift for minor stuff but all
>> heavy lifting in cql3
>>
>> 3. Startio Deep https://github.com/Stratio/stratio-deep: Has lot more to
>> offer if you use all startio stack, Deep is for Spark, Statio Streaming is
>> built on top of spark streaming, Stratio meta is something similar to
>> sharkor sparksql and finally stratio Cassandra which is a fork of Cassandra
>> with advanced Lucene based indexing
>>
>>
>>
>

Mime
View raw message