Mailing-List: contact user-help@ignite.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@ignite.apache.org
Content-Type: multipart/alternative; boundary="Apple-Mail=_BC088F85-6FD5-4DAF-90F1-A901279A60B1"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: spark SQL thriftserver over ignite and cassandra
From: Denis Magda <dmagda@gridgain.com>
In-Reply-To: <CAFn71VPxEYXrPNt6uLJ++pc9pdabdeG25SGz8fh4Fx9Rz-2fyw@mail.gmail.com>
Date: Wed, 5 Oct 2016 15:12:14 -0700
Cc: Igor Sapego <isapego@gridgain.com>
Message-Id: <671A1C29-DE68-40E3-801D-45437E451D7D@gridgain.com>
References: <CAFn71VN2U70FH0r1dA2DAV6haSJ7bXUWk9JXfkyC8SZySWBT-w@mail.gmail.com> <CALH+G9oqsRZVrfKb6OgcbH_0NX0t5q-pabOh1qgHOApu-xsYxw@mail.gmail.com> <FE1ABD88-7644-43AC-BE23-7A70BB2E13EA@gmail.com> <CAFn71VPEiQTQRyzdCKtsj8LRobdpwnQ8SZP68n8udeEeLAKc6w@mail.gmail.com> <1CB72FC6-465C-4E65-8283-06CAD72F54D5@gridgain.com> <CAFn71VPxEYXrPNt6uLJ++pc9pdabdeG25SGz8fh4Fx9Rz-2fyw@mail.gmail.com>
To: user@ignite.apache.org
archived-at: Wed, 05 Oct 2016 22:12:24 -0000


--Apple-Mail=_BC088F85-6FD5-4DAF-90F1-A901279A60B1
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Vincent,

Please see below

> On Oct 5, 2016, at 4:31 AM, vincent gromakowski =
<vincent.gromakowski@gmail.com> wrote:
>=20
> Hi
> thanks for your explanations. Please find inline more questions=20
>=20
> Vincent
>=20
> 2016-10-05 3:33 GMT+02:00 Denis Magda <dmagda@gridgain.com =
<mailto:dmagda@gridgain.com>>:
> Hi Vincent,
>=20
> See my answers inline
>=20
>> On Oct 4, 2016, at 12:54 AM, vincent gromakowski =
<vincent.gromakowski@gmail.com <mailto:vincent.gromakowski@gmail.com>> =
wrote:
>>=20
>> Hi,
>> I know that Ignite has SQL support but:
>> - ODBC driver doesn't seem to provide HTTP(S) support, which is =
easier to integrate on corporate networks with rules, firewalls, proxies
>=20
> Igor Sapego, what URIs are supported presently?=20
>=20
>> - The SQL engine doesn't seem to scale like Spark SQL would. For =
instance, Spark won't generate OOM is dataset (source or result) doesn't =
fit in memory. =46rom Ignite side, it's not clear=E2=80=A6
>=20
> OOM is not related to scalability topic at all. This is about =
application=E2=80=99s logic.=20
>=20
> Ignite SQL engine perfectly scales out along with your cluster. =
Moreover, Ignite supports indexes which allows you to get O(logN) =
running time complexity for your SQL queries while in case of Spark you =
will face with full-scans (O(N)) all the time.
>=20
> However, to benefit from Ignite SQL queries you have to put all the =
data in-memory. Ignite doesn=E2=80=99t go to a CacheStore (Cassandra, =
relational database, MongoDB, etc) while a SQL query is executed and =
won=E2=80=99t preload anything from an underlying CacheStore. Automatic =
preloading works for key-value queries like cache.get(key).
>=20
>=20
> This is an issue because I will potentially have to query TB of data. =
If I use Spark thriftserver backed by IgniteRDD, does it solve this =
point and can I get automatic preloading from C* ?

IgniteRDD will load missing tuples (key-value) pair from Cassandra =
because essentially IgniteRDD is an IgniteCache and Cassandra is a =
CacheStore. The only thing that is left to check is whether Spark =
triftserver can work with IgniteRDDs. Hope you will be able figure out =
this and share your feedback with us.


>=20
>> - Spark thrift can manage multi tenancy: different users can connect =
to the same SQL engine and share cache. In Ignite it's one cache per =
user, so a big waste of RAM.
>=20
> Everyone can connect to an Ignite cluster and work with the same set =
of distributed caches. I=E2=80=99m not sure why you need to create =
caches with the same content for every user.
>=20
> It's a security issue, Ignite cache doesn't provide multiple user =
account per cache. I am thinking of using Spark to authenticate multiple =
users and then Spark use a shared account on Ignite cache
> =20
Basically, Ignite provides basic security interfaces and some =
implementations which you can rely on by building your secure solution. =
This article can be useful for your case
http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/ =
<http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/>

=E2=80=94
Denis

>=20
> If you need a real multi-tenancy support where cacheA is allowed to be =
accessed by a group of users A only and cacheB by users from group B =
then you can take a look at GridGain which is built on top of Ignite
> https://gridgain.readme.io/docs/multi-tenancy =
<https://gridgain.readme.io/docs/multi-tenancy>
>=20
>=20
>=20
> OK but I am evaluating open source only solutions (kylin, druid, =
alluxio...), it's a constraint from my hierarchy
>>=20
>> What I want to achieve is :
>> - use Cassandra for data store as it provides idempotence (HDFS/hive =
doesn't), resulting in exactly once semantic without any duplicates.=20
>> - use Spark SQL thriftserver in multi tenancy for large scale adhoc =
analytics queries (> TB) from an ODBC driver through HTTP(S)=20
>> - accelerate Cassandra reads when the data modeling of the Cassandra =
table doesn't fit the queries. Queries would be OLAP style: target =
multiple C* partitions, groupby or filters on lots of dimensions that =
aren't necessarely in the C* table key.
>>=20
>=20
> As it was mentioned Ignite uses Cassandra as a CacheStore. You should =
keep this in mind. Before trying to assemble all the chain I would =
recommend you trying to connect Spark SQL thrift server directly to =
Ignite and work with its shared RDDs [1]. A shared RDD (basically Ignite =
cache) can be backed by Cassandra. Probably this chain will work for you =
but I can=E2=80=99t give more precise guidance on this.
>=20
>=20
> I will try to make it works and give you feedback
>=20
> =20
> [1] https://apacheignite-fs.readme.io/docs/ignite-for-spark =
<https://apacheignite-fs.readme.io/docs/ignite-for-spark>
> =20
> =E2=80=94
> Denis
>=20
>> Thanks for your advises
>>=20
>>=20
>> 2016-10-04 6:51 GMT+02:00 J=C3=B6rn Franke <jornfranke@gmail.com =
<mailto:jornfranke@gmail.com>>:
>> I am not sure that this will be performant. What do you want to =
achieve here? Fast lookups? Then the Cassandra Ignite store might be the =
right solution. If you want to do more analytic style of queries then =
you can put the data on HDFS/Hive and use the Ignite HDFS cache to cache =
certain partitions/tables in Hive in-memory. If you want to go to =
iterative machine learning algorithms you can go for Spark on top of =
this. You can use then also Ignite cache for Spark RDDs.
>>=20
>> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznetsov@gridgain.com =
<mailto:akuznetsov@gridgain.com>> wrote:
>>=20
>>> Hi, Vincent!
>>>=20
>>> Ignite also has SQL support (also scalable), I think it will be much =
faster to query directly from Ignite than query from Spark.
>>> Also please mind, that before executing queries you should load all =
needed data to cache.
>>> To load data from Cassandra to Ignite you may use Cassandra store =
[1].
>>>=20
>>> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra =
<https://apacheignite.readme.io/docs/ignite-with-apache-cassandra>
>>>=20
>>> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski =
<vincent.gromakowski@gmail.com <mailto:vincent.gromakowski@gmail.com>> =
wrote:
>>> Hi,
>>> I am evaluating the possibility to use Spark SQL (and its =
scalability) over an Ignite cache with Cassandra persistent store to =
increase read workloads like OLAP style analytics.
>>> Is there any way to configure Spark thriftserver to load an external =
table in Ignite like we can do in Cassandra ?
>>> Here is an example of config for spark backed by cassandra
>>>=20
>>> CREATE EXTERNAL TABLE MyHiveTable=20
>>>         ( id int, data string )=20
>>>         STORED BY =
'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler'=20
>>>         TBLPROPERTIES ("cassandra.host" =3D "x.x.x.x", =
"cassandra.ks.name <http://cassandra.ks.name/>" =3D "test" ,=20
>>>           "cassandra.cf.name <http://cassandra.cf.name/>" =3D =
"mytable" ,=20
>>>           "cassandra.ks.repfactor" =3D "1" ,=20
>>>           "cassandra.ks.strategy" =3D=20
>>>             "org.apache.cassandra.locator.SimpleStrategy" );=20
>>>=20
>>>=20
>>>=20
>>>=20
>>> --=20
>>> Alexey Kuznetsov


--Apple-Mail=_BC088F85-6FD5-4DAF-90F1-A901279A60B1
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">Vincent,<div class=3D""><br class=3D""></div><div =
class=3D"">Please see below</div><div class=3D""><br =
class=3D""><div><blockquote type=3D"cite" class=3D""><div class=3D"">On =
Oct 5, 2016, at 4:31 AM, vincent gromakowski &lt;<a =
href=3D"mailto:vincent.gromakowski@gmail.com" =
class=3D"">vincent.gromakowski@gmail.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px;" class=3D"">Hi<div class=3D"">thanks for =
your explanations. Please find inline more questions&nbsp;</div><div =
class=3D""><br class=3D""></div><div class=3D"">Vincent</div><div =
class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">2016-10-05=
 3:33 GMT+02:00 Denis Magda<span =
class=3D"Apple-converted-space">&nbsp;</span><span dir=3D"ltr" =
class=3D"">&lt;<a href=3D"mailto:dmagda@gridgain.com" target=3D"_blank" =
class=3D"">dmagda@gridgain.com</a>&gt;</span>:<br class=3D""><blockquote =
class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; =
border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex;"><div style=3D"word-wrap: =
break-word;" class=3D"">Hi Vincent,<div class=3D""><br =
class=3D""></div><div class=3D"">See my answers inline</div><div =
class=3D""><br class=3D""><div class=3D""><blockquote type=3D"cite" =
class=3D""><span class=3D"gmail-"><div class=3D"">On Oct 4, 2016, at =
12:54 AM, vincent gromakowski &lt;<a =
href=3D"mailto:vincent.gromakowski@gmail.com" target=3D"_blank" =
class=3D"">vincent.gromakowski@gmail.com</a><wbr class=3D"">&gt; =
wrote:</div><br =
class=3D"gmail-m_-7051521873639505860Apple-interchange-newline"></span><di=
v class=3D""><div dir=3D"ltr" class=3D"">Hi,<span class=3D"gmail-"><div =
class=3D"">I know that Ignite has SQL support but:</div><div class=3D"">- =
ODBC driver doesn't seem to provide HTTP(S) support, which is easier to =
integrate on corporate networks with rules, firewalls, =
proxies</div></span></div></div></blockquote><div class=3D""><br =
class=3D""></div><div class=3D""><b class=3D"">Igor Sapego</b>, what =
URIs are supported presently?&nbsp;</div><br class=3D""><blockquote =
type=3D"cite" class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D"">- The SQL engine doesn't seem to scale like Spark SQL would. =
For instance, Spark won't generate OOM is dataset (source or result) =
doesn't fit in memory. =46rom Ignite side, it's not =
clear=E2=80=A6</div></div></div></blockquote><div class=3D""><br =
class=3D""></div>OOM is not related to scalability topic at all. This is =
about application=E2=80=99s logic.&nbsp;</div><div class=3D""><br =
class=3D""></div><div class=3D"">Ignite SQL engine perfectly scales out =
along with your cluster. Moreover, Ignite supports indexes which allows =
you to get O(logN) running time complexity for your SQL queries while in =
case of Spark you will face with full-scans (O(N)) all the =
time.</div><div class=3D""><br class=3D""></div><div class=3D"">However, =
to benefit from Ignite SQL queries you have to put all the data =
in-memory. Ignite doesn=E2=80=99t go to a CacheStore (Cassandra, =
relational database, MongoDB, etc) while a SQL query is executed and =
won=E2=80=99t preload anything from an underlying CacheStore. Automatic =
preloading works for key-value queries like =
cache.get(key).</div></div></div></blockquote><div class=3D""><br =
class=3D""></div><div class=3D""><br class=3D""></div><div class=3D"">This=
 is an issue because I will potentially have to query TB of data. If I =
use Spark thriftserver backed by IgniteRDD, does it solve this point and =
can I get automatic preloading from C* =
?</div></div></div></div></div></blockquote><div><br =
class=3D""></div>IgniteRDD will load missing tuples (key-value) pair =
from Cassandra because essentially IgniteRDD is an IgniteCache and =
Cassandra is a CacheStore. The only thing that is left to check is =
whether Spark triftserver can work with IgniteRDDs. Hope you will be =
able figure out this and share your feedback with us.</div><div><br =
class=3D""></div><div><br class=3D""><blockquote type=3D"cite" =
class=3D""><div class=3D""><div dir=3D"ltr" style=3D"font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: =
0px;" class=3D""><div class=3D"gmail_extra"><div =
class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin: =
0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, =
204, 204); border-left-style: solid; padding-left: 1ex;"><div =
style=3D"word-wrap: break-word;" class=3D""><div class=3D""><div =
class=3D""><span class=3D"gmail-"><br class=3D""><blockquote type=3D"cite"=
 class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div class=3D"">- =
Spark thrift can manage multi tenancy: different users can connect to =
the same SQL engine and share cache. In Ignite it's one cache per user, =
so a big waste of RAM.</div></div></div></blockquote><div class=3D""><br =
class=3D""></div></span><div class=3D"">Everyone can connect to an =
Ignite cluster and work with the same set of distributed caches. I=E2=80=99=
m not sure why you need to create caches with the same content for every =
user.</div></div></div></div></blockquote><div class=3D""><br =
class=3D""></div><div class=3D"">It's a security issue, Ignite cache =
doesn't provide multiple user account per cache. I am thinking of using =
Spark to authenticate multiple users and then Spark use a shared account =
on Ignite cache</div><div =
class=3D"">&nbsp;</div></div></div></div></div></blockquote>Basically, =
Ignite provides basic security interfaces and some implementations which =
you can rely on by building your secure solution. This article can be =
useful for your case</div><div><a =
href=3D"http://smartkey.co.uk/development/securing-an-apache-ignite-cluste=
r/" =
class=3D"">http://smartkey.co.uk/development/securing-an-apache-ignite-clu=
ster/</a></div><div><br =
class=3D""></div><div>=E2=80=94</div><div>Denis</div><div><br =
class=3D""></div><div><blockquote type=3D"cite" class=3D""><div =
class=3D""><div dir=3D"ltr" style=3D"font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; orphans: auto; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; widows: =
auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><div =
class=3D"gmail_extra"><div class=3D"gmail_quote"><blockquote =
class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; =
border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex;"><div style=3D"word-wrap: =
break-word;" class=3D""><div class=3D""><div class=3D""><div =
class=3D""><br class=3D""></div><div class=3D"">If you need a real =
multi-tenancy support where cacheA is allowed to be accessed by a group =
of users A only and cacheB by users from group B then you can take a =
look at GridGain which is built on top of Ignite</div><div class=3D""><a =
href=3D"https://gridgain.readme.io/docs/multi-tenancy" target=3D"_blank" =
class=3D"">https://gridgain.readme.io/<wbr =
class=3D"">docs/multi-tenancy</a></div><span class=3D"gmail-"><div =
class=3D""><br class=3D""></div><div class=3D""><br =
class=3D""></div></span></div></div></div></blockquote><div class=3D""><br=
 class=3D""></div><div class=3D"">OK but I am evaluating open source =
only solutions (kylin, druid, alluxio...), it's a constraint from my =
hierarchy</div><blockquote class=3D"gmail_quote" style=3D"margin: 0px =
0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, =
204); border-left-style: solid; padding-left: 1ex;"><div =
style=3D"word-wrap: break-word;" class=3D""><div class=3D""><div =
class=3D""><span class=3D"gmail-"><div class=3D""></div><blockquote =
type=3D"cite" class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D""><br class=3D""></div><div class=3D"">What I want to achieve =
is :</div><div class=3D"">- use Cassandra for data store as it provides =
idempotence (HDFS/hive doesn't), resulting in exactly once semantic =
without any duplicates.&nbsp;</div><div class=3D"">- use Spark SQL =
thriftserver in multi tenancy for large scale adhoc analytics queries =
(&gt; TB) from an ODBC driver through HTTP(S)&nbsp;</div><div class=3D"">-=
 accelerate Cassandra reads when the data modeling of the Cassandra =
table doesn't fit the queries. Queries would be OLAP style: target =
multiple C* partitions, groupby or filters on lots of dimensions that =
aren't necessarely in the C* table key.</div><div class=3D""><br =
class=3D""></div></div></div></blockquote><div class=3D""><br =
class=3D""></div></span>As it was mentioned Ignite uses Cassandra as a =
CacheStore. You should keep this in mind. Before trying to assemble all =
the chain I would recommend you trying to connect Spark SQL thrift =
server directly to Ignite and work with its shared RDDs [1]. A shared =
RDD (basically Ignite cache) can be backed by Cassandra. Probably this =
chain will work for you but I can=E2=80=99t give more precise guidance =
on this.</div><div class=3D""><br =
class=3D""></div></div></div></blockquote><div class=3D""><br =
class=3D""></div><div class=3D"">I will try to make it works and give =
you feedback</div><div class=3D""><br class=3D""></div><div =
class=3D"">&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"margin: =
0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, =
204, 204); border-left-style: solid; padding-left: 1ex;"><div =
style=3D"word-wrap: break-word;" class=3D""><div class=3D""><div =
class=3D""></div><div class=3D"">[1]&nbsp;<a =
href=3D"https://apacheignite-fs.readme.io/docs/ignite-for-spark" =
target=3D"_blank" class=3D"">https://apacheignite-fs.<wbr =
class=3D"">readme.io/docs/ignite-for-<wbr class=3D"">spark</a></div><div =
class=3D"">&nbsp;</div><div class=3D"">=E2=80=94</div><span =
class=3D"gmail-HOEnZb"><font color=3D"#888888" class=3D""><div =
class=3D"">Denis</div></font></span><span class=3D"gmail-"><div =
class=3D""><br class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D""><div dir=3D"ltr" class=3D""><div class=3D"">Thanks for your =
advises</div><div class=3D""><br class=3D""></div></div><div =
class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">2016-10-04=
 6:51 GMT+02:00 J=C3=B6rn Franke<span =
class=3D"Apple-converted-space">&nbsp;</span><span dir=3D"ltr" =
class=3D"">&lt;<a href=3D"mailto:jornfranke@gmail.com" target=3D"_blank" =
class=3D"">jornfranke@gmail.com</a>&gt;</span>:<br class=3D""><blockquote =
class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; =
border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex;"><div dir=3D"auto" =
class=3D""><div class=3D""></div><div class=3D"">I am not sure that this =
will be performant. What do you want to achieve here? Fast lookups? Then =
the Cassandra Ignite store might be the right solution. If you want to =
do more analytic style of queries then you can put the data on HDFS/Hive =
and use the Ignite HDFS cache to cache certain partitions/tables in Hive =
in-memory. If you want to go to iterative machine learning algorithms =
you can go for Spark on top of this. You can use then also Ignite cache =
for Spark RDDs.</div><div class=3D""><div =
class=3D"gmail-m_-7051521873639505860h5"><div class=3D""><br class=3D"">On=
 4 Oct 2016, at 02:24, Alexey Kuznetsov &lt;<a =
href=3D"mailto:akuznetsov@gridgain.com" target=3D"_blank" =
class=3D"">akuznetsov@gridgain.com</a>&gt; wrote:<br class=3D""><br =
class=3D""></div><blockquote type=3D"cite" class=3D""><div class=3D""><div=
 dir=3D"ltr" class=3D"">Hi, Vincent!<div class=3D""><br =
class=3D""></div><div class=3D"">Ignite also has SQL support (also =
scalable), I think it will be much faster to query directly from Ignite =
than query from Spark.</div><div class=3D"">Also please mind, that =
before executing queries you should load all needed data to =
cache.</div><div class=3D"">To load data from Cassandra to Ignite you =
may use Cassandra store [1].</div><div class=3D""><br =
class=3D""></div><div class=3D"">[1]&nbsp;<a =
href=3D"https://apacheignite.readme.io/docs/ignite-with-apache-cassandra" =
target=3D"_blank" class=3D"">https://apacheignite.readm<wbr =
class=3D"">e.io/docs/ignite-with-apache-<wbr =
class=3D"">cassandra</a></div><div class=3D"gmail_extra"><br =
class=3D""><div class=3D"gmail_quote">On Tue, Oct 4, 2016 at 4:19 AM, =
vincent gromakowski<span =
class=3D"Apple-converted-space">&nbsp;</span><span dir=3D"ltr" =
class=3D"">&lt;<a href=3D"mailto:vincent.gromakowski@gmail.com" =
target=3D"_blank" class=3D"">vincent.gromakowski@gmail.com</a><wbr =
class=3D"">&gt;</span><span =
class=3D"Apple-converted-space">&nbsp;</span>wrote:<br =
class=3D""><blockquote class=3D"gmail_quote" style=3D"margin: 0px 0px =
0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, =
204); border-left-style: solid; padding-left: 1ex;"><div dir=3D"ltr" =
class=3D""><div class=3D""><div class=3D""><div class=3D"">Hi,<br =
class=3D""></div>I am evaluating the possibility to use Spark SQL (and =
its scalability) over an Ignite cache with Cassandra persistent store to =
increase read workloads like OLAP style analytics.<br class=3D""></div>Is =
there any way to configure Spark thriftserver to load an external table =
in Ignite like we can do in Cassandra ?<br class=3D""></div>Here is an =
example of config for spark backed by cassandra<br class=3D""><br =
class=3D"">CREATE EXTERNAL TABLE MyHiveTable<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D"">&nbsp; =
&nbsp; &nbsp; &nbsp;<span class=3D"Apple-converted-space">&nbsp;</span>( =
id int, data string )<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D"">&nbsp; =
&nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>STORED BY =
'org.apache.hadoop.hive.cassan<wbr =
class=3D"">dra.cql.CqlStorageHandler'<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D"">&nbsp; =
&nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>TBLPROPERTIES =
("cassandra.host" =3D "x.x.x.x", "<a href=3D"http://cassandra.ks.name/" =
target=3D"_blank" class=3D"">cassandra.ks.name</a>" =3D "test" ,<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D"">&nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>"<a =
href=3D"http://cassandra.cf.name/" target=3D"_blank" =
class=3D"">cassandra.cf.name</a>" =3D "mytable" ,<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D"">&nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>"cassandra.ks.repfactor" =3D =
"1" ,<span class=3D"Apple-converted-space">&nbsp;</span><br =
class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>"cassandra.ks.strategy" =
=3D<span class=3D"Apple-converted-space">&nbsp;</span><br =
class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>"org.apache.cassandra.locator=
.<wbr class=3D"">SimpleStrategy" );<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D""><br =
class=3D""></div></blockquote></div><br class=3D""><br clear=3D"all" =
class=3D""><div class=3D""><br class=3D""></div>--<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D""><div =
class=3D"gmail-m_-7051521873639505860m_-7889317989448286737gmail_signature=
">Alexey =
Kuznetsov</div></div></div></div></blockquote></div></div></div></blockquo=
te></div></div></div></blockquote></div></span></div></div></blockquote></=
div></div></div></div></blockquote></div><br =
class=3D""></div></body></html>=

--Apple-Mail=_BC088F85-6FD5-4DAF-90F1-A901279A60B1--