From dev-return-40796-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Mon Oct 22 11:22:14 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id EEB6D18064A for ; Mon, 22 Oct 2018 11:22:13 +0200 (CEST) Received: (qmail 33893 invoked by uid 500); 22 Oct 2018 09:22:13 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 33881 invoked by uid 99); 22 Oct 2018 09:22:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Oct 2018 09:22:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CF0E318062F for ; Mon, 22 Oct 2018 09:22:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.011 X-Spam-Level: X-Spam-Status: No, score=-0.011 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gridgain-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id gl4tTJFpHI9H for ; Mon, 22 Oct 2018 09:22:09 +0000 (UTC) Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A04C25F48D for ; Mon, 22 Oct 2018 09:22:09 +0000 (UTC) Received: by mail-wm1-f48.google.com with SMTP id a8-v6so9524022wmf.1 for ; Mon, 22 Oct 2018 02:22:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gridgain-com.20150623.gappssmtp.com; s=20150623; h=from:content-transfer-encoding:mime-version:subject:date:references :to:in-reply-to:message-id; bh=XPIAMOcVUu2ac0V93vl+ma7v4TAm9lKokUfx5P0M03Q=; b=O7XgmIrwL//R/GbLglwaFtYOTGykfk5Jqxs5eDAdH8p7yWxXVyGRZzwrc+OzboMFFa AxDFeVcdNoa1h2X+fdOUVd6JhFn0BwENde4uA68Qi68d730komddvdAd3g114jfYG1OY acD9zGubBA5+Q476w5taxvd3LuGprw76F4LuiOMqUEllMYAjwd3ehUr3qAPndAiC0z/O azqgZHwHAntylEtJO522EE5e4+ld0epP6Xq6MrZcy2YIty/Bh2a0lFAw+ayfwKBzNdhg KSgg3CuT1x8Zo8f6RgA2k/XR0rp/QHCW7ruMtNwiiBkCMIgrWyTv91ej666Zb2Yn6T8v KULA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:date:references:to:in-reply-to:message-id; bh=XPIAMOcVUu2ac0V93vl+ma7v4TAm9lKokUfx5P0M03Q=; b=QEOJsiAv98e31kmXzKRsUReSgzgiUP/cEYuldXZauDabKpOocuA4gei10E7zM0QQmv VWM6gJBYQjrQLB49cke9tbrHugNenbMYR/V/NcTE2saa9uxS+yqWPKl8vGhCNrvHx+Cm 3ZdNaP5VdKK33DY9WbrznkW3a5O1sU8SaP8B20Y16fpsR3u4a7TTjKjR4B1hyl22zjcm W2b2G3iNKZ3MSq591PVCNv6EUm44DSpjXRVa5Bl1huRTqC8tp8wFcYX17408Y3cKeGuZ E0AewjUGGSHkkApzuYKKCkVMNqhlJJD4mnfdwoWZ3Zp5IFvXgB0srW6LkiJyn9ZEJu0p W/Jw== X-Gm-Message-State: ABuFfogrwypWIkGq+pGpgLtBaq+QXbwgK+kKJhXiVMQwE3HAmUPKgM4S YIBtNsGsLjSGwjRS902jOx0yR4024yU= X-Google-Smtp-Source: ACcGV605TVoJZUD90ft5n8A/j+lh7wgMpZx+6HiCOR2AaH3x6066EcAo30uWEIGoZebRYAu0H+qH+A== X-Received: by 2002:a1c:a683:: with SMTP id p125-v6mr15489106wme.24.1540200128222; Mon, 22 Oct 2018 02:22:08 -0700 (PDT) Received: from [192.168.1.5] (cpc99732-croy27-2-0-cust80.19-2.cable.virginm.net. [82.38.166.81]) by smtp.gmail.com with ESMTPSA id t194-v6sm13228400wmd.48.2018.10.22.02.22.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Oct 2018 02:22:07 -0700 (PDT) From: Stephen Darlington Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: Re: [DISCUSSION] Spark Data Frame through Thin Client Date: Mon, 22 Oct 2018 10:22:06 +0100 References: <20cc20cec397bf94918644310df198df8dcebdbc.camel@gmail.com> To: dev@ignite.apache.org In-Reply-To: Message-Id: <0E9367B1-AD49-40FA-86FA-A20F34598615@gridgain.com> X-Mailer: Apple Mail (2.3445.100.39) Are you suggesting making the Thin Client deployment an option or as a = replacement for the thick-client? If the latter, do we risk making = future desirable changes more difficult (or impossible)? I=E2=80=99m = thinking specifically about better support for Spark Streaming, where = the lack of continuous query support in thin clients removes a = significant optimisation option. I=E2=80=99m sure there are other use = cases. Regards, Stephen > On 21 Oct 2018, at 09:08, Nikolay Izhikov wrote: >=20 > Valentin. >=20 > Seems, You made several suggestions, which is not always true, from my = point of view: >=20 > 1. "We have access to Spark cluster installation to perform deployment = steps" - this is not true in cloud or enterprise environment. >=20 > 2. "Spark cluster is used only for Ignite integration". > =46rom what I know computational resources for big Spark cluster is = divided by many business divisions. > And it is not convenient to perform some deployment steps on this = cluster. >=20 > 3. "When Ignite + Spark are used in real production it's OK to have = reasonable deployment overhead" > What about developer who want to play with this integration? > And want to do it quickly to see how it works in real life examples. > Can we do his life much easier? >=20 >> First of all, they will exist with thin client either. >=20 > Spark have an ability to deploy jars on worker and add it to = application tasks classpath. > For 2.6 we must deploy 11 additional jars to start using Ignite. > Please, see my example on the bottom of documentation page [1] >=20 > Does cache-api-1.0.0.jar and h2-1.4.195.jar seems like obvious = dependencies for Ignite integration for you? > And for our users? :) >=20 > Actually, list of dependencies will be changed in 2.7 - new version of = jcache, new version of h2 > So user should change it in code or perform additional deployment = steps. >=20 > It overkill for me. >=20 > On the other hand - thin client requires only 1 jar. > Moreover, thin client protocol have the backward compatibility. > So thin client will perform correctly when Ignite cluster will be = updated from 2.6 to 2.7. > So, with Spark integration via thin client we will be able to update = Ignite cluster and Spark integration separately. > For now, we should do it in one big step. >=20 > What do you think? >=20 > [1] https://apacheignite-fs.readme.io/docs/installation-deployment >=20 > =D0=92 =D0=A1=D0=B1, 20/10/2018 =D0=B2 18:33 -0700, Valentin = Kulichenko =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> Guys, >>=20 >> =46rom my experience, Ignite and Spark clusters typically run in the = same >> environment, which makes client node a more preferable option. = Mainly, >> because of performance. BTW, I doubt partition-awareness on thin = client >> will help either, because in dataframes we only run SQL queries and I >> believe thin client will execute them through a proxy anyway. But = correct >> me if I=E2=80=99m wrong. >>=20 >> Either way, it sounds like we just have usability issues with = Ignite/Spark >> integration. Why don=E2=80=99t we concentrate on fixing them then? = For example, #3 >> can be fixed by loading XML content on master and then distributing = it to >> workers, instead of loading on every worker independently. Then there = are >> certain procedures like deploying JARs, etc. First of all, they will = exist >> with thin client either. Second of all, I=E2=80=99m sure there are = ways to simplify >> this procedures and make integration easier. My opinion is that = working on >> such improvements is going to add more value than another = implementation >> based on thin client. >>=20 >> -Val >>=20 >> On Sat, Oct 20, 2018 at 4:03 PM Denis Magda = wrote: >>=20 >>> Hello Nikolay, >>>=20 >>> Your proposal sounds reasonable. However, I would suggest us to wait = while >>> partition-awareness is supported for Java thin client first. With = that >>> feature, the client can connect to any node directly while presently = all >>> the communication goes through a proxy (a node the client is = connected to). >>> All of that is bad for performance. >>>=20 >>>=20 >>> Vladimir, how hard would it be to support the partition-awareness = for Java >>> client? Probably, Nikolay can take over. >>>=20 >>> -- >>> Denis >>>=20 >>>=20 >>> On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov = >>> wrote: >>>=20 >>>> Hello, Igniters. >>>>=20 >>>> Currently, Spark Data Frame integration implemented via client node >>>> connection. >>>> Whenever we need to retrieve some data into Spark worker(or master) = from >>>> Ignite we start a client node. >>>>=20 >>>> It has several major disadvantages: >>>>=20 >>>> 1. We should copy whole Ignite distribution on to each Spark >>>> worker [1] >>>> 2. We should copy whole Ignite distribution on to Spark = master to >>>> get catalogue works. >>>> 3. We should have the same absolute path to Ignite = configuration >>>> file on every worker and provide it during data frame construction = [2] >>>> 4. We should additionally configure Spark workerks classpath = to >>>> include Ignite libraries. >>>>=20 >>>> For now, almost all operation we need to do in Spark Data Frame >>>> integration is supported by Java Thin Client. >>>> * obtain the list of caches. >>>> * get cache configuration. >>>> * execute SQL query. >>>> * stream data to the table - don't support by the thin = client for >>>> now, but can be implemented using simple SQL INSERT statements. >>>>=20 >>>> Advantages of usage Java Thin Client in Spark integration(they all = known >>>> from Java Thin Client advantages): >>>> 1. Easy to configure: only IP addresses of server nodes are >>>> required. >>>> 2. Easy to deploy: only 1 additional jar required. No server >>>> side(Ignite worker) configuration required. >>>>=20 >>>> I propose to implement Spark Data Frame integration through Java = Thin >>>> Client. >>>>=20 >>>> Thoughts? >>>>=20 >>>> [1] https://apacheignite-fs.readme.io/docs/installation-deployment >>>> [2] >>>>=20 >>>=20 >>> = https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-da= taframe-options >>>>=20