Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 56778200C50 for ; Sat, 25 Mar 2017 00:31:28 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 52290160B96; Fri, 24 Mar 2017 23:31:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 744B8160B93 for ; Sat, 25 Mar 2017 00:31:27 +0100 (CET) Received: (qmail 91829 invoked by uid 500); 24 Mar 2017 23:31:25 -0000 Mailing-List: contact user-help@kylin.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kylin.apache.org Delivered-To: mailing list user@kylin.apache.org Received: (qmail 91820 invoked by uid 99); 24 Mar 2017 23:31:25 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Mar 2017 23:31:25 +0000 Received: from mail-ot0-f175.google.com (mail-ot0-f175.google.com [74.125.82.175]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 9152E1A089C for ; Fri, 24 Mar 2017 23:31:25 +0000 (UTC) Received: by mail-ot0-f175.google.com with SMTP id a5so2880339oth.1 for ; Fri, 24 Mar 2017 16:31:25 -0700 (PDT) X-Gm-Message-State: AFeK/H3HmYjUxNZ7s5ljAg71kw95MsnPPiRo1nL96OljZ5XovZqtOTBtaWngzlqq4XRPDsMfG4GL3IZaiS+N+w== X-Received: by 10.157.17.104 with SMTP id p37mr5455500otp.249.1490398284920; Fri, 24 Mar 2017 16:31:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.62.26 with HTTP; Fri, 24 Mar 2017 16:31:24 -0700 (PDT) In-Reply-To: References: From: Li Yang Date: Sat, 25 Mar 2017 07:31:24 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Kylin + SparkSQL integration To: user@kylin.apache.org Cc: liyang@apache.org Content-Type: multipart/alternative; boundary=001a1141b4b058bcfe054b826422 archived-at: Fri, 24 Mar 2017 23:31:28 -0000 --001a1141b4b058bcfe054b826422 Content-Type: text/plain; charset=UTF-8 > taking advantage of underlaying datasource capabilities (predicate pushdown, projection etc) is important to improve query performance. That is very true. There was discussion about replacing HBase with Cassandra previously. And the worry is lack of coprocessor will prevent predicate & aggregation pushdown. Similar concern exists for Kudu. Cheers Yang On Fri, Mar 24, 2017 at 12:50 AM, Nirav Patel wrote: > Thanks for logging those improvements. I think decision about replacing > Hbase or using any other nosql datastore for storing cubes would be based > on many factors but one important I can think of is the query > engine/optimizer of all of those datasources. I think taking advantage of > underlaying datasource capabilities (predicate pushdown, projection etc) is > important to improve query performance. > > Cheers, > Nirav > > On Mon, Mar 20, 2017 at 12:23 PM, Li Yang wrote: > >> Hi Nirav, >> >> Glad to see you on the mailing list!! >> >> Yes, this is a great idea and it is on the roadmap. (This reminds me, I >> should update the roadmap on kylin website soon.) >> >> However there are many moving parts that affect how we approach it. E.g. >> >> - If coprocessor is retired, do we still need HBase? >> - If HBase is retired, what is the alternative storage? How about >> metadata? >> - There are other ways to integrate SparkSQL (KYLIN-2515), how do they >> fit in... >> >> There are many work in this direction, I would say. >> >> Cheers >> Yang >> >> On Tue, Mar 21, 2017 at 2:05 AM, Nirav Patel >> wrote: >> >>> Hi, >>> >>> In recent strata conference I raised a question if kylin can support >>> sparkSQL as a query engine or have a kylin query resultset converted into >>> spark DataSet(DataFrame) on which user can perform further distributed >>> computation. >>> Reason are >>> 1) some flavor of Hbase doesnt support co-processor >>> 2) SparkSql UDF much easier to develop then hbase coprocessor >>> 3) User can write their own spark UDF and run any custom aggregation >>> >>> Is this on roadmap ? >>> >>> Thanks, >>> Nirav >>> >>> >>> >>> [image: What's New with Xactly] >>> >>> [image: LinkedIn] >>> [image: Twitter] >>> [image: Facebook] >>> [image: YouTube] >>> >> >> >> > > > > [image: What's New with Xactly] > > [image: LinkedIn] > [image: Twitter] > [image: Facebook] > [image: YouTube] > > --001a1141b4b058bcfe054b826422 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
> taking advantage of underlaying datasource = capabilities (predicate pushdown, projection etc) is important to improve q= uery performance.

That is very true. There was discussion abou= t replacing HBase with Cassandra previously. A= nd the worry is lack of coprocessor will prevent predicate & aggregatio= n pushdown. Similar concern exists for Kudu.

Cheers
Yang

On Fri, Mar 24, 2017 at 12:50 AM, Nirav Patel <npatel@xactlycorp.com<= /a>> wrote:
Th= anks for logging those improvements. I think decision about replacing Hbase= or using any other nosql datastore for storing cubes would be based on man= y factors but one important I can think of is the query engine/optimizer of= all of those datasources. I think taking advantage of underlaying datasour= ce capabilities (predicate pushdown, projection etc) is important to improv= e query performance.

Cheers,
Nirav

=
On Mon, Mar 20, 2017 at 12:23 PM, Li Yang <liya= ng@apache.org> wrote:
Hi Nirav,

= Glad to see you on the mailing list!!

Yes, this is a grea= t idea and it is on the roadmap. (This reminds me, I should update the road= map on kylin website soon.)

However there are many moving part= s that affect how we approach it. E.g.

- If coprocessor is ret= ired, do we still need HBase?
- If HBase is retired, what is the a= lternative storage? How about metadata?
- There are other ways to = integrate SparkSQL (KYLIN-2515), how do they fit in...

There a= re many work in this direction, I would say.

Cheers
Yang<= /div>

On Tue, Mar 21, 2017 at 2:05 AM, Nirav Pate= l <npatel@xactlycorp.com> wrote:
Hi,

In recent strata conference I raised a ques= tion if kylin can support sparkSQL as a query engine or have a kylin query = resultset converted into spark DataSet(DataFrame) on which user can perform= further distributed computation.=C2=A0
Reason are
1) s= ome flavor of Hbase doesnt support co-processor
2) SparkSql UDF = =C2=A0much easier to develop then hbase coprocessor
3) User can w= rite their own spark UDF and run any custom aggregation

Is this on roadmap ?

Thanks,
Nirav=



3D"What's=

=C2=A0=C2=A03D"LinkedIn"=C2=A0=C2=A03D"Twitter"=C2=A0=C2=A03D"Facebook"=C2=A0=C2=A03D"YouTube"





3D"What's

=C2=A0=C2=A03D"LinkedIn"=C2=A0=C2=A03D"Twitter"=C2=A0=C2=A03D"Facebook"=C2=A0=C2=A03D"YouTube"

--001a1141b4b058bcfe054b826422--