Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E0DC7200D48 for ; Wed, 29 Nov 2017 12:01:26 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DF6F0160C04; Wed, 29 Nov 2017 11:01:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 07233160BF6 for ; Wed, 29 Nov 2017 12:01:25 +0100 (CET) Received: (qmail 86994 invoked by uid 500); 29 Nov 2017 11:01:25 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 86982 invoked by uid 99); 29 Nov 2017 11:01:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Nov 2017 11:01:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0E7CA1807A9 for ; Wed, 29 Nov 2017 11:01:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id XSB23ciqDfrE for ; Wed, 29 Nov 2017 11:01:21 +0000 (UTC) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2C5795F19D for ; Wed, 29 Nov 2017 11:01:21 +0000 (UTC) Received: by mail-lf0-f47.google.com with SMTP id x20so3394444lff.1 for ; Wed, 29 Nov 2017 03:01:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=CQE8Xxd1R2U6x/I/NUYbiqX55jLPs+ljMkQSPeARqUE=; b=vCnmNQDMslviv9m8JbwTr5ac0/pOk9x2vnMSOUQyu8mflggEZlysTlZm/Aafs8jq/U VQIW+NsWe6iuDj5l6vdawH2708Bx1zjwQLt3+rQfsO5eidJqsGchoxNtsZ2qqQMufXa7 rN2sMua6vrmwVlcAVEKyxWSamedvhPES+L/XkYEpGSY3pfmhfWnKiZsCFBndhLUVQm3+ pu2Gzs6pQuDFe4YQrkKowTEqe4q7HaCEmYgLExVqiQMhUiyGXu35Cs5bkb9WprQRdNya N1aS3BQ7qwZ5q9Ohh0C70QibfnmO8uD4fDqwFIQoI0sRMauaLVasD6Wfv0IBl9yiMcBR kckA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=CQE8Xxd1R2U6x/I/NUYbiqX55jLPs+ljMkQSPeARqUE=; b=IcNgus9KejlCFcV6/LEojyJyCj1OAKK4xJF5h+oIhprsT4Nf8Y6Z3BkVyVkLH07MUf 42/2SolsPoSvjn0YjeQW9LyoGJQDE5FTUVxpiIfA/10HiDKutlnnKRZxhSk1Fg+dSJ24 XMET5Q2eHg9JWTC42GIjm5JsMdr1L+jME1OaW11H3wJ47Waa0tsIhBHsUUkXNlr2kN56 UlLkcP8pt50QzeIdpoQgboCCIbwWyUZuKHncxOvQFBKV9/nCWt6/SBwoaPPTGO2x6Ld1 ex68UDqERdNySvkgNo+EJ727RAXTe84VlffrpMwjfkxTzVr814oCbcFJoLMmUqf28AP+ wSZw== X-Gm-Message-State: AJaThX5VnwyX2aRBHLv5mhiw0G+I2QofKXdFviPia2IW++n5RDCKI5OE A+H7PGbwZ3IqG/lc/Jd1t7k3MAosJCEoOgeQ2O65BQ== X-Google-Smtp-Source: AGs4zMYepaYwH8/A85ktf6Y72cnrDMGLRBqkQo5DFIfIDzMF1Ybtmm75A7rBGI9Phgzyc5256vaTUky5LbYonSHoWqU= X-Received: by 10.46.9.69 with SMTP id 66mr1200875ljj.134.1511953280293; Wed, 29 Nov 2017 03:01:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.99.92 with HTTP; Wed, 29 Nov 2017 03:01:19 -0800 (PST) In-Reply-To: References: <23a6488d-733a-51a4-d88e-dbd8a9773513@gmail.com> From: =?UTF-8?B?0J3QuNC60L7Qu9Cw0Lkg0JjQttC40LrQvtCy?= Date: Wed, 29 Nov 2017 14:01:19 +0300 Message-ID: Subject: Re: Optimization of SQL queries from Spark Data Frame to Ignite To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="94eb2c05eb18305d8b055f1d0e51" archived-at: Wed, 29 Nov 2017 11:01:27 -0000 --94eb2c05eb18305d8b055f1d0e51 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, Vladimir. > partition pruning is already implemented in Ignite, so there is no need to do this on your own. Spark work with partitioned data set. It is required to provide data partition information to Spark from custom Data Source(Ignite). Can I get information about pruned partitions throw some public API? Is there a plan or ticket to implement such API? 2017-11-29 10:34 GMT+03:00 Vladimir Ozerov : > Nikolay, > > Regarding p3. - partition pruning is already implemented in Ignite, so > there is no need to do this on your own. > > On Wed, Nov 29, 2017 at 3:23 AM, Valentin Kulichenko < > valentin.kulichenko@gmail.com> wrote: > > > Nikolay, > > > > Custom strategy allows to fully process the AST generated by Spark and > > convert it to Ignite SQL, so there will be no execution on Spark side a= t > > all. This is what we are trying to achieve here. Basically, one will be > > able to use DataFrame API to execute queries directly on Ignite. Does i= t > > make sense to you? > > > > I would recommend you to take a look at MemSQL implementation which doe= s > > similar stuff: https://github.com/memsql/memsql-spark-connector > > > > Note that this approach will work only if all relations included in AST > are > > Ignite tables. Otherwise, strategy should return null so that Spark fal= ls > > back to its regular mode. Ignite will be used as regular data source in > > this case, and probably it's possible to implement some optimizations > here > > as well. However, I never investigated this and it seems like another > > separate discussion. > > > > -Val > > > > On Tue, Nov 28, 2017 at 9:54 AM, =D0=9D=D0=B8=D0=BA=D0=BE=D0=BB=D0=B0= =D0=B9 =D0=98=D0=B6=D0=B8=D0=BA=D0=BE=D0=B2 > > wrote: > > > > > Hello, guys. > > > > > > I have implemented basic support of Spark Data Frame API [1], [2] for > > > Ignite. > > > Spark provides API for a custom strategy to optimize queries from spa= rk > > to > > > underlying data source(Ignite). > > > > > > The goal of optimization(obvious, just to be on the same page): > > > Minimize data transfer between Spark and Ignite. > > > Speedup query execution. > > > > > > I see 3 ways to optimize queries: > > > > > > 1. *Join Reduce* If one make some query that join two or more > > > Ignite tables, we have to pass all join info to Ignite and transfer t= o > > > Spark only result of table join. > > > To implement it we have to extend current implementation with > new > > > RelationProvider that can generate all kind of joins for two or more > > tables. > > > We should add some tests, also. > > > The question is - how join result should be partitioned? > > > > > > > > > 2. *Order by* If one make some query to Ignite table with ord= er > > by > > > clause we can execute sorting on Ignite side. > > > But it seems that currently Spark doesn=E2=80=99t have any wa= y to tell > > > that partitions already sorted. > > > > > > > > > 3. *Key filter* If one make query with `WHERE key =3D XXX` or > > `WHERE > > > key IN (X, Y, Z)`, we can reduce number of partitions. > > > And query only partitions that store certain key values. > > > Is this kind of optimization already built in Ignite or I > should > > > implement it by myself? > > > > > > May be, there is any other way to make queries run faster? > > > > > > [1] https://spark.apache.org/docs/latest/sql-programming-guide.html > > > [2] https://github.com/apache/ignite/pull/2742 > > > > > > --=20 Nikolay Izhikov NIzhikov.dev@gmail.com --94eb2c05eb18305d8b055f1d0e51--