ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vincent gromakowski <vincent.gromakow...@gmail.com>
Subject Re: spark SQL thriftserver over ignite and cassandra
Date Tue, 04 Oct 2016 21:31:46 GMT
Do you have any remark/correction on my  assumptions ?

Le 4 oct. 2016 9:54 AM, "vincent gromakowski" <vincent.gromakowski@gmail.com>
a écrit :

> Hi,
> I know that Ignite has SQL support but:
> - ODBC driver doesn't seem to provide HTTP(S) support, which is easier to
> integrate on corporate networks with rules, firewalls, proxies
> - The SQL engine doesn't seem to scale like Spark SQL would. For instance,
> Spark won't generate OOM is dataset (source or result) doesn't fit in
> memory. From Ignite side, it's not clear...
> - Spark thrift can manage multi tenancy: different users can connect to
> the same SQL engine and share cache. In Ignite it's one cache per user, so
> a big waste of RAM.
>
> What I want to achieve is :
> - use Cassandra for data store as it provides idempotence (HDFS/hive
> doesn't), resulting in exactly once semantic without any duplicates.
> - use Spark SQL thriftserver in multi tenancy for large scale adhoc
> analytics queries (> TB) from an ODBC driver through HTTP(S)
> - accelerate Cassandra reads when the data modeling of the Cassandra table
> doesn't fit the queries. Queries would be OLAP style: target multiple C*
> partitions, groupby or filters on lots of dimensions that aren't
> necessarely in the C* table key.
>
> Thanks for your advises
>
>
> 2016-10-04 6:51 GMT+02:00 Jörn Franke <jornfranke@gmail.com>:
>
>> I am not sure that this will be performant. What do you want to achieve
>> here? Fast lookups? Then the Cassandra Ignite store might be the right
>> solution. If you want to do more analytic style of queries then you can put
>> the data on HDFS/Hive and use the Ignite HDFS cache to cache certain
>> partitions/tables in Hive in-memory. If you want to go to iterative machine
>> learning algorithms you can go for Spark on top of this. You can use then
>> also Ignite cache for Spark RDDs.
>>
>> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznetsov@gridgain.com>
>> wrote:
>>
>> Hi, Vincent!
>>
>> Ignite also has SQL support (also scalable), I think it will be much
>> faster to query directly from Ignite than query from Spark.
>> Also please mind, that before executing queries you should load all
>> needed data to cache.
>> To load data from Cassandra to Ignite you may use Cassandra store [1].
>>
>> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra
>>
>> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> Hi,
>>> I am evaluating the possibility to use Spark SQL (and its scalability)
>>> over an Ignite cache with Cassandra persistent store to increase read
>>> workloads like OLAP style analytics.
>>> Is there any way to configure Spark thriftserver to load an external
>>> table in Ignite like we can do in Cassandra ?
>>> Here is an example of config for spark backed by cassandra
>>>
>>> CREATE EXTERNAL TABLE MyHiveTable
>>>         ( id int, data string )
>>>         STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler'
>>>
>>>         TBLPROPERTIES ("cassandra.host" = "x.x.x.x", "cassandra.ks.name"
>>> = "test" ,
>>>           "cassandra.cf.name" = "mytable" ,
>>>           "cassandra.ks.repfactor" = "1" ,
>>>           "cassandra.ks.strategy" =
>>>             "org.apache.cassandra.locator.SimpleStrategy" );
>>>
>>>
>>
>>
>> --
>> Alexey Kuznetsov
>>
>>
>

Mime
View raw message