Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F235D200CE9 for ; Sat, 5 Aug 2017 00:42:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F0C1C16E610; Fri, 4 Aug 2017 22:42:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4103A16E60E for ; Sat, 5 Aug 2017 00:42:10 +0200 (CEST) Received: (qmail 10482 invoked by uid 500); 4 Aug 2017 22:42:08 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 10468 invoked by uid 99); 4 Aug 2017 22:42:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Aug 2017 22:42:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7F3CD1802C9 for ; Fri, 4 Aug 2017 22:42:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.801 X-Spam-Level: X-Spam-Status: No, score=-0.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gridgain-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Arf-dh7IyTKl for ; Fri, 4 Aug 2017 22:42:04 +0000 (UTC) Received: from mail-ua0-f176.google.com (mail-ua0-f176.google.com [209.85.217.176]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 81BD95F6BE for ; Fri, 4 Aug 2017 22:42:04 +0000 (UTC) Received: by mail-ua0-f176.google.com with SMTP id f9so12844272uaf.4 for ; Fri, 04 Aug 2017 15:42:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gridgain-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=nXp04v3N3/tVojqqpqfJY0yI63s8nFDbnXxLPl0Zz94=; b=m1yVmOyyNqKf2zExDmVH6rE22EOsYEj/zN5mGaUnH99aTFdnmMAwnZqp+5GiNTjiRM NpCRv5tohImSNG3sYKnL/KMflxIf5vrx8w/FO2j2rD3fTxlV9Lq1eYszPVntRMzHezoc mEuOT3WjTIJ1JVaoAvDASzUElWozyTEA3TpqFA1bc0o6XvUjhDIASfbQ+G29ALNFgp8v gDdGW92bgjsl0vU2buz8UxE6rn25Qs/7Fk0WT3MHt0uTJXRJo1Ds0xTQFvXq6uap3d13 XxXnRuPOxcA1/A7rW/P3Ksv22sqRTPdeqPccdLxmhLU62so0KuM7fbJC/G+naug/KijY 3bcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=nXp04v3N3/tVojqqpqfJY0yI63s8nFDbnXxLPl0Zz94=; b=ptMRd4q5Q3W2wuxXkZgy/Kiyvq1Z2kvh10Qre2SmjfDApoMO9mq84FZyt/XK55WDao Xo7Z0CcBemZ/P8nKUgus84DuyP4gLLY7x3Gr5xgmK8+bmSnnI/xxjW4NR9+LZKLtwtFq 7AvMyYmnjhYoygMSaios9Q6f2EFO4ZlXfdVGcnUmVSLAO9cBuLIhIoCe2oGz5+//tzNm b8dqIbiYm/2Z95DWh7ldCxV+SiFBe8avbeuxsBmFQcNR8KWGnbsZuQvOeFiBDeahD2ra bBdAA4EthhLHo3dAX9tgn1TVvlg6u1areCxpiMSGNJHGlbxOGbOOGGAFGN/2Io/+VoG5 svHQ== X-Gm-Message-State: AIVw111PoQDpSvl0Lt3fzepgS0HJ7kAPlPErvubkbmtY4OKNnvgZG0ED JrmpeViW4IdkKZuwqe1R+HOK/883lrat X-Received: by 10.159.49.77 with SMTP id n13mr2552528uab.117.1501886523941; Fri, 04 Aug 2017 15:42:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.65.35 with HTTP; Fri, 4 Aug 2017 15:41:23 -0700 (PDT) In-Reply-To: References: <62B3CBCA-7217-4FD1-9C24-E46493D542CB@gmail.com> <293EC953-9A98-44F6-A224-855BC58DC4BA@gmail.com> From: Dmitriy Setrakyan Date: Sat, 5 Aug 2017 00:41:23 +0200 Message-ID: Subject: Re: Spark Data Frame support in Ignite To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="f403045e262cc0ae810555f5349c" archived-at: Fri, 04 Aug 2017 22:42:11 -0000 --f403045e262cc0ae810555f5349c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Aug 3, 2017 at 9:04 PM, Valentin Kulichenko < valentin.kulichenko@gmail.com> wrote: > This JDBC integration is just a Spark data source, which means that Spark > will fetch data in its local memory first, and only then apply filters, > aggregations, etc. This is obviously slow and doesn't use all advantages > Ignite provides. > > To create useful and valuable integration, we should create a custom > Strategy that will convert Spark's logical plan into a SQL query and > execute it directly on Ignite. > I get it, but we have been talking about Data Frame support for longer than a year. I think we should advise our users to switch to JDBC until the community gets someone to implement it. > > -Val > > On Thu, Aug 3, 2017 at 12:12 AM, Dmitriy Setrakyan > wrote: > > > On Thu, Aug 3, 2017 at 9:04 AM, J=C3=B6rn Franke > wrote: > > > > > I think the development effort would still be higher. Everything woul= d > > > have to be put via JDBC into Ignite, then checkpointing would have to > be > > > done via JDBC (again additional development effort), a lot of > conversion > > > from spark internal format to JDBC and back to ignite internal format= . > > > Pagination I do not see as a useful feature for managing large data > > volumes > > > from databases - on the contrary it is very inefficient (and one woul= d > to > > > have to implement logic to fetch al pages). Pagination was also never > > > thought of for fetching large data volumes, but for web pages showing= a > > > small result set over several pages, where the user can click manuall= y > > for > > > the next page (what they anyway not do most of the time). > > > > > > While it might be a quick solution , I think a deeper integration tha= n > > > JDBC would be more beneficial. > > > > > > > Jorn, I completely agree. However, we have not been able to find a > > contributor for this feature. You sound like you have sufficient domain > > expertise in Spark and Ignite. Would you be willing to help out? > > > > > > > > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan > > > wrote: > > > > > > > >> On Thu, Aug 3, 2017 at 8:45 AM, J=C3=B6rn Franke > > > wrote: > > > >> > > > >> I think the JDBC one is more inefficient, slower requires too much > > > >> development effort. You can also check the integration of Alluxio > with > > > >> Spark. > > > >> > > > > > > > > As far as I know, Alluxio is a file system, so it cannot use JDBC. > > > Ignite, > > > > on the other hand, is an SQL system and works well with JDBC. As fa= r > as > > > the > > > > development effort, we are dealing with SQL, so I am not sure why > JDBC > > > > would be harder. > > > > > > > > Generally speaking, until Ignite provides native data frame > > integration, > > > > having JDBC-based integration out of the box is minimally acceptabl= e. > > > > > > > > > > > >> Then, in general I think JDBC has never designed for large data > > volumes. > > > >> It is for executing queries and getting a small or aggregated resu= lt > > set > > > >> back. Alternatively for inserting / updating single rows. > > > >> > > > > > > > > Agree in general. However, Ignite JDBC is designed to work with > larger > > > data > > > > volumes and supports data pagination automatically. > > > > > > > > > > > >>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan > > > > >> wrote: > > > >>> > > > >>> Jorn, thanks for your feedback! > > > >>> > > > >>> Can you explain how the direct support would be different from th= e > > JDBC > > > >>> support? > > > >>> > > > >>> Thanks, > > > >>> D. > > > >>> > > > >>>> On Thu, Aug 3, 2017 at 7:40 AM, J=C3=B6rn Franke > > > > >> wrote: > > > >>>> > > > >>>> These are two different things. Spark applications themselves do > not > > > use > > > >>>> JDBC - it is more for non-spark applications to access Spark > > > DataFrames. > > > >>>> > > > >>>> A direct support by Ignite would make more sense. Although you > have > > in > > > >>>> theory IGFS, if the user is using HDFS, which might not be the > case. > > > It > > > >> is > > > >>>> now also very common to use Object stores, such as S3. > > > >>>> Direct support could be leverage for interactive analysis or > > different > > > >>>> Spark applications sharing data. > > > >>>> > > > >>>>> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan < > dsetrakyan@apache.org > > > > > > >>>> wrote: > > > >>>>> > > > >>>>> Igniters, > > > >>>>> > > > >>>>> We have had the integration with Spark Data Frames on our roadm= ap > > > for a > > > >>>>> while: > > > >>>>> https://issues.apache.org/jira/browse/IGNITE-3084 > > > >>>>> > > > >>>>> However, while browsing Spark documentation, I cam across the > > generic > > > >>>> JDBC > > > >>>>> data frame support in Spark: > > > >>>>> https://spark.apache.org/docs/latest/sql-programming-guide. > > > >>>> html#jdbc-to-other-databases > > > >>>>> > > > >>>>> Given that Ignite has a JDBC driver, does it mean that it > > > transitively > > > >>>> also > > > >>>>> supports Spark data frames? If yes, we should document it. > > > >>>>> > > > >>>>> D. > > > >>>> > > > >> > > > > > > --f403045e262cc0ae810555f5349c--