Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D4C94200B8B for ; Tue, 4 Oct 2016 09:54:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D3460160AC9; Tue, 4 Oct 2016 07:54:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F2167160AC5 for ; Tue, 4 Oct 2016 09:54:19 +0200 (CEST) Received: (qmail 85079 invoked by uid 500); 4 Oct 2016 07:54:19 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 85063 invoked by uid 99); 4 Oct 2016 07:54:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 07:54:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7ECA1C1AD2 for ; Tue, 4 Oct 2016 07:54:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.679 X-Spam-Level: * X-Spam-Status: No, score=1.679 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id lyhHewu1t9v5 for ; Tue, 4 Oct 2016 07:54:14 +0000 (UTC) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5EA355FAD5 for ; Tue, 4 Oct 2016 07:54:14 +0000 (UTC) Received: by mail-lf0-f48.google.com with SMTP id b75so42409504lfg.3 for ; Tue, 04 Oct 2016 00:54:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=jEInu500yR2z9bHD7VnbclGTkYfLE9NuetoBdk3DvnM=; b=XG1sURH9JPgyu2MD66gFBUMSc7BZaZn8vktx3De8V44OQUlYnmVZuRu7qFwiaEdU7b dBtJRyGVrXBmNtxaz6MGpVmQ5gRwfiWQGq4LxIoSLGilNy893GuYpLAruomdI3qyfJt5 X0esMwbJmJCvgTyWrESTIL9ATMj3HPyr6Bn3YZr+l6/MsE5g7i1EqJ0FKJVYHZ+gPR7q MlBdztv7yeBzbSmq5RQ79VzejZB5coOsB+p0x/HmrdV4OTpc2OOWciCCz7cnfOF+3N6g Bb44tPwODg3nhQw8zCs6VoKmCP0M2DxRaaPm/rvY0pq9859ahD2VsVrpA9yq7HFG76/q UCSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=jEInu500yR2z9bHD7VnbclGTkYfLE9NuetoBdk3DvnM=; b=GTuUBb1Ps47TqnWZ3Tll8nj7NSJt392q6KiVzPAQij1bRkNWVAaVC/g0Goj0uqOc76 vH1D7Tow1/8AnwG6Vsi0RfPoi6IvDTNR+i+O0lBn0iXxj/sgSNCEhE3su5OpgTdSNObd V2dfXB27kxt4eaPYXA+wodG1tAz55m+YwXgQi3EKXCr5azxrUjR5YE/AAS1exnCl3ExS HYUPfN/4qHR74qjQiwnWg3BUpaKYyBGpAyhfJJj7VEVmIr5du6JUqUB/kcEbjcpOqjdW FzrbrmDGx+DhlCXPsLYX/Beyasy5nKEvgtRkR5WJeuOX0qY0nqw5zidfBIt8jY3zzHXn UDyw== X-Gm-Message-State: AA6/9RnFh7nK6ydokXEUXgijH9QyxLLTeRXzweOj3D6Te/YxZw6xTrmle+sS2h9O+ZKfLIHHyL6hOlfBVSXwWQ== X-Received: by 10.25.168.212 with SMTP id r203mr802669lfe.85.1475567647796; Tue, 04 Oct 2016 00:54:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.2.81 with HTTP; Tue, 4 Oct 2016 00:54:07 -0700 (PDT) In-Reply-To: References: From: vincent gromakowski Date: Tue, 4 Oct 2016 09:54:07 +0200 Message-ID: Subject: Re: spark SQL thriftserver over ignite and cassandra To: user@ignite.apache.org Content-Type: multipart/alternative; boundary=001a114032a27d3533053e055d6f archived-at: Tue, 04 Oct 2016 07:54:21 -0000 --001a114032a27d3533053e055d6f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I know that Ignite has SQL support but: - ODBC driver doesn't seem to provide HTTP(S) support, which is easier to integrate on corporate networks with rules, firewalls, proxies - The SQL engine doesn't seem to scale like Spark SQL would. For instance, Spark won't generate OOM is dataset (source or result) doesn't fit in memory. From Ignite side, it's not clear... - Spark thrift can manage multi tenancy: different users can connect to the same SQL engine and share cache. In Ignite it's one cache per user, so a big waste of RAM. What I want to achieve is : - use Cassandra for data store as it provides idempotence (HDFS/hive doesn't), resulting in exactly once semantic without any duplicates. - use Spark SQL thriftserver in multi tenancy for large scale adhoc analytics queries (> TB) from an ODBC driver through HTTP(S) - accelerate Cassandra reads when the data modeling of the Cassandra table doesn't fit the queries. Queries would be OLAP style: target multiple C* partitions, groupby or filters on lots of dimensions that aren't necessarely in the C* table key. Thanks for your advises 2016-10-04 6:51 GMT+02:00 J=C3=B6rn Franke : > I am not sure that this will be performant. What do you want to achieve > here? Fast lookups? Then the Cassandra Ignite store might be the right > solution. If you want to do more analytic style of queries then you can p= ut > the data on HDFS/Hive and use the Ignite HDFS cache to cache certain > partitions/tables in Hive in-memory. If you want to go to iterative machi= ne > learning algorithms you can go for Spark on top of this. You can use then > also Ignite cache for Spark RDDs. > > On 4 Oct 2016, at 02:24, Alexey Kuznetsov wrote= : > > Hi, Vincent! > > Ignite also has SQL support (also scalable), I think it will be much > faster to query directly from Ignite than query from Spark. > Also please mind, that before executing queries you should load all neede= d > data to cache. > To load data from Cassandra to Ignite you may use Cassandra store [1]. > > [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra > > On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski < > vincent.gromakowski@gmail.com> wrote: > >> Hi, >> I am evaluating the possibility to use Spark SQL (and its scalability) >> over an Ignite cache with Cassandra persistent store to increase read >> workloads like OLAP style analytics. >> Is there any way to configure Spark thriftserver to load an external >> table in Ignite like we can do in Cassandra ? >> Here is an example of config for spark backed by cassandra >> >> CREATE EXTERNAL TABLE MyHiveTable >> ( id int, data string ) >> STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandle= r' >> >> TBLPROPERTIES ("cassandra.host" =3D "x.x.x.x", "cassandra.ks.nam= e" >> =3D "test" , >> "cassandra.cf.name" =3D "mytable" , >> "cassandra.ks.repfactor" =3D "1" , >> "cassandra.ks.strategy" =3D >> "org.apache.cassandra.locator.SimpleStrategy" ); >> >> > > > -- > Alexey Kuznetsov > > --001a114032a27d3533053e055d6f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,
I know that Ignite has SQL support but:
= - ODBC driver doesn't seem to provide HTTP(S) support, which is easier = to integrate on corporate networks with rules, firewalls, proxies
- The SQL engine doesn't seem to scale like Spark SQL would. For insta= nce, Spark won't generate OOM is dataset (source or result) doesn't= fit in memory. From Ignite side, it's not clear...
- Spark t= hrift can manage multi tenancy: different users can connect to the same SQL= engine and share cache. In Ignite it's one cache per user, so a big wa= ste of RAM.

What I want to achieve is :
= - use Cassandra for data store as it provides idempotence (HDFS/hive doesn&= #39;t), resulting in exactly once semantic without any duplicates.=C2=A0
- use Spark SQL thriftserver in multi tenancy for large scale adhoc= analytics queries (> TB) from an ODBC driver through HTTP(S)=C2=A0
- accelerate Cassandra reads when the data modeling of the Cassandra = table doesn't fit the queries. Queries would be OLAP style: target mult= iple C* partitions, groupby or filters on lots of dimensions that aren'= t necessarely in the C* table key.

Thanks for your= advises


2016-10-04 6:51 GMT+02:00 J=C3=B6rn Franke <jornfrank= e@gmail.com>:
I am not sure that this will be performant. What do yo= u want to achieve here? Fast lookups? Then the Cassandra Ignite store might= be the right solution. If you want to do more analytic style of queries th= en you can put the data on HDFS/Hive and use the Ignite HDFS cache to cache= certain partitions/tables in Hive in-memory. If you want to go to iterativ= e machine learning algorithms you can go for Spark on top of this. You can = use then also Ignite cache for Spark RDDs.

On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznetsov@gridgain.com> wrote:=

Hi, Vincent!<= div>
Ignite also has SQL support (also scalable), I think it = will be much faster to query directly from Ignite than query from Spark.
Also please mind, that before executing queries you should load all= needed data to cache.
To load data from Cassandra to Ignite you = may use Cassandra store [1].


= On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski <<= a href=3D"mailto:vincent.gromakowski@gmail.com" target=3D"_blank">vincent.g= romakowski@gmail.com> wrote:
Hi,
I am evaluating the pos= sibility to use Spark SQL (and its scalability) over an Ignite cache with C= assandra persistent store to increase read workloads like OLAP style analyt= ics.
Is there any way to configure Spark thriftserver to load an e= xternal table in Ignite like we can do in Cassandra ?
Here is an e= xample of config for spark backed by cassandra

CREATE EXTERNAL TABLE= MyHiveTable
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ( id int, data string )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 STORED BY 'org.apache.hadoop.hive.cassa= ndra.cql.CqlStorageHandler'
=C2=A0 =C2=A0 =C2=A0 =C2=A0 TBLPROPERTIES ("cassandra.host" = =3D "x.x.x.x", "cassandra.ks.name" =3D "test" ,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "cassandra.cf.name" =3D "mytable"= ; ,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "cassandra.ks.repfactor" = =3D "1" ,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "cassandra.ks.strategy" = =3D
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "org.apache.cassandra.lo= cator.SimpleStrategy" );




--
Alexey Kuznetsov

--001a114032a27d3533053e055d6f--