ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vincent gromakowski <vincent.gromakow...@gmail.com>
Subject Re: spark SQL thriftserver over ignite and cassandra
Date Tue, 04 Oct 2016 07:54:07 GMT
I know that Ignite has SQL support but:
- ODBC driver doesn't seem to provide HTTP(S) support, which is easier to
integrate on corporate networks with rules, firewalls, proxies
- The SQL engine doesn't seem to scale like Spark SQL would. For instance,
Spark won't generate OOM is dataset (source or result) doesn't fit in
memory. From Ignite side, it's not clear...
- Spark thrift can manage multi tenancy: different users can connect to the
same SQL engine and share cache. In Ignite it's one cache per user, so a
big waste of RAM.

What I want to achieve is :
- use Cassandra for data store as it provides idempotence (HDFS/hive
doesn't), resulting in exactly once semantic without any duplicates.
- use Spark SQL thriftserver in multi tenancy for large scale adhoc
analytics queries (> TB) from an ODBC driver through HTTP(S)
- accelerate Cassandra reads when the data modeling of the Cassandra table
doesn't fit the queries. Queries would be OLAP style: target multiple C*
partitions, groupby or filters on lots of dimensions that aren't
necessarely in the C* table key.

Thanks for your advises

2016-10-04 6:51 GMT+02:00 Jörn Franke <jornfranke@gmail.com>:

> I am not sure that this will be performant. What do you want to achieve
> here? Fast lookups? Then the Cassandra Ignite store might be the right
> solution. If you want to do more analytic style of queries then you can put
> the data on HDFS/Hive and use the Ignite HDFS cache to cache certain
> partitions/tables in Hive in-memory. If you want to go to iterative machine
> learning algorithms you can go for Spark on top of this. You can use then
> also Ignite cache for Spark RDDs.
> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznetsov@gridgain.com> wrote:
> Hi, Vincent!
> Ignite also has SQL support (also scalable), I think it will be much
> faster to query directly from Ignite than query from Spark.
> Also please mind, that before executing queries you should load all needed
> data to cache.
> To load data from Cassandra to Ignite you may use Cassandra store [1].
> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra
> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>> Hi,
>> I am evaluating the possibility to use Spark SQL (and its scalability)
>> over an Ignite cache with Cassandra persistent store to increase read
>> workloads like OLAP style analytics.
>> Is there any way to configure Spark thriftserver to load an external
>> table in Ignite like we can do in Cassandra ?
>> Here is an example of config for spark backed by cassandra
>>         ( id int, data string )
>>         STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler'
>>         TBLPROPERTIES ("cassandra.host" = "x.x.x.x", "cassandra.ks.name"
>> = "test" ,
>>           "cassandra.cf.name" = "mytable" ,
>>           "cassandra.ks.repfactor" = "1" ,
>>           "cassandra.ks.strategy" =
>>             "org.apache.cassandra.locator.SimpleStrategy" );
> --
> Alexey Kuznetsov

View raw message