From dev-return-40779-archive-asf-public=cust-asf.ponee.io@ignite.apache.org  Sun Oct 21 03:33:38 2018
Return-Path: <dev-return-40779-archive-asf-public=cust-asf.ponee.io@ignite.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 5E367180668
	for <archive-asf-public@cust-asf.ponee.io>; Sun, 21 Oct 2018 03:33:38 +0200 (CEST)
Received: (qmail 91668 invoked by uid 500); 21 Oct 2018 01:33:37 -0000
Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@ignite.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@ignite.apache.org>
List-Post: <mailto:dev@ignite.apache.org>
List-Id: <dev.ignite.apache.org>
Reply-To: dev@ignite.apache.org
Delivered-To: mailing list dev@ignite.apache.org
Received: (qmail 91655 invoked by uid 99); 21 Oct 2018 01:33:36 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Oct 2018 01:33:36 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0F9DCC7FCC
	for <dev@ignite.apache.org>; Sun, 21 Oct 2018 01:33:36 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.897
X-Spam-Level: *
X-Spam-Status: No, score=1.897 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001,
	RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id vdNcqg4Aa2wZ for <dev@ignite.apache.org>;
	Sun, 21 Oct 2018 01:33:35 +0000 (UTC)
Received: from mail-it1-f171.google.com (mail-it1-f171.google.com [209.85.166.171])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DCB045F3EA
	for <dev@ignite.apache.org>; Sun, 21 Oct 2018 01:33:34 +0000 (UTC)
Received: by mail-it1-f171.google.com with SMTP id p64-v6so8683957itp.0
        for <dev@ignite.apache.org>; Sat, 20 Oct 2018 18:33:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=5V1gLkc7Lef9V6LuP6nU5RJUsZ7pmaCd1OInhsxpHWc=;
        b=g++3rxhfQGiG236ZZ/PU039LSq7ibYMgkXeJQrQRWab6DN4akJrPcRY0U5LHJDK5Ek
         FJPhdlDUMKayegI0I6qMgpvPzkgVl9Emm3wQLkCQbtKWUg5+YfmjVr2v9iQvGpRF3REg
         45ClmBz4FvUzReCFk7jnmu1uJZnu2ue9Y4jbgpcCeLIv9xVQ+ej7StalpfxJlJ70gD8E
         q/TH6Cv0pLs7+Pf+paF/Jhd0JieFadeg3rScj2RokYFLenoOiLzAD2XEA5BvsiV7v/aB
         zJ9Q5VUR7gTTlRu/qAwwxBj7zYUESHjS947mnfC5hChtQFvZdligwdfFEj0R91dc55Zh
         UoOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=5V1gLkc7Lef9V6LuP6nU5RJUsZ7pmaCd1OInhsxpHWc=;
        b=jcWT0FWHw7sjv8HT41qa5qQDpRz9uwT0cHzkI4GiZFnctFs6TgH5O410zTcx74EWWo
         dYphqjIWVCDOTwLyhPtXYdJZxflfOleze66xsH5vZQW6n1BnYSJIDzXeAFWjvFiOVbkB
         1Fi5PEN4G+EzX3OyqJtZouJy1kUgkljs6xNv6kJhFobixFsTjdU8oFc7y+1Fqzw1mMXP
         pcWLzVgOnbUZI87dZ50FxFKJ4/7FJ0FyZSVcWmZezh+3EoPWxSvlHVNQ2FfcXe6kc5X0
         iArykEF7G3T1EFt0+6zIUybXsQ8z/FnAMbGN8KTYp2Ff3l34nC2XaMWnCz/nUCIsgaRT
         EmFw==
X-Gm-Message-State: ABuFfohZ9ZTDfQX4aYksPrmbBtNFnzfxX6Z+eI2tG6yutKgXVDqSRLRn
	r9CixYdvRY5JInjl+fbV9t/r0Pd6NJ3hc0z7GnYDvw==
X-Google-Smtp-Source: ACcGV62UTeDIn4SghwE0LjIL3V8Dm3pBn8bfCtFgZXvBgU73FNfKalgXBNiERPXfQIBNUz2gIrkjJUtO2rvUVi7ioxY=
X-Received: by 2002:a24:a0c:: with SMTP id 12-v6mr5325071itw.145.1540085607936;
 Sat, 20 Oct 2018 18:33:27 -0700 (PDT)
MIME-Version: 1.0
References: <20cc20cec397bf94918644310df198df8dcebdbc.camel@gmail.com> <CAK0qHnqLXak-_mX55tz6CjJUoXOs2vL6Za6ch7WnqcM2M+upYA@mail.gmail.com>
In-Reply-To: <CAK0qHnqLXak-_mX55tz6CjJUoXOs2vL6Za6ch7WnqcM2M+upYA@mail.gmail.com>
From: Valentin Kulichenko <valentin.kulichenko@gmail.com>
Date: Sat, 20 Oct 2018 18:33:17 -0700
Message-ID: <CABuYRcrtVW0XdDRjxr-hXN9E5cbsjbsWd5mKJebtPxcknmEeFg@mail.gmail.com>
Subject: Re: [DISCUSSION] Spark Data Frame through Thin Client
To: dev@ignite.apache.org
Content-Type: multipart/alternative; boundary="00000000000095b6b60578b31f46"

--00000000000095b6b60578b31f46
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Guys,

From my experience, Ignite and Spark clusters typically run in the same
environment, which makes client node a more preferable option. Mainly,
because of performance. BTW, I doubt partition-awareness on thin client
will help either, because in dataframes we only run SQL queries and I
believe thin client will execute them through a proxy anyway. But correct
me if I=E2=80=99m wrong.

Either way, it sounds like we just have usability issues with Ignite/Spark
integration. Why don=E2=80=99t we concentrate on fixing them then? For exam=
ple, #3
can be fixed by loading XML content on master and then distributing it to
workers, instead of loading on every worker independently. Then there are
certain procedures like deploying JARs, etc. First of all, they will exist
with thin client either. Second of all, I=E2=80=99m sure there are ways to =
simplify
this procedures and make integration easier. My opinion is that working on
such improvements is going to add more value than another implementation
based on thin client.

-Val

On Sat, Oct 20, 2018 at 4:03 PM Denis Magda <dmagda@apache.org> wrote:

> Hello Nikolay,
>
> Your proposal sounds reasonable. However, I would suggest us to wait whil=
e
> partition-awareness is supported for Java thin client first. With that
> feature, the client can connect to any node directly while presently all
> the communication goes through a proxy (a node the client is connected to=
).
> All of that is bad for performance.
>
>
> Vladimir, how hard would it be to support the partition-awareness for Jav=
a
> client? Probably, Nikolay can take over.
>
> --
> Denis
>
>
> On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov <nizhikov@apache.org>
> wrote:
>
> > Hello, Igniters.
> >
> > Currently, Spark Data Frame integration implemented via client node
> > connection.
> > Whenever we need to retrieve some data into Spark worker(or master) fro=
m
> > Ignite we start a client node.
> >
> > It has several major disadvantages:
> >
> >         1. We should copy whole Ignite distribution on to each Spark
> > worker [1]
> >         2. We should copy whole Ignite distribution on to Spark master =
to
> > get catalogue works.
> >         3. We should have the same absolute path to Ignite configuratio=
n
> > file on every worker and provide it during data frame construction [2]
> >         4. We should additionally configure Spark workerks classpath to
> > include Ignite libraries.
> >
> > For now, almost all operation we need to do in Spark Data Frame
> > integration is supported by Java Thin Client.
> >         * obtain the list of caches.
> >         * get cache configuration.
> >         * execute SQL query.
> >         * stream data to the table - don't support by the thin client f=
or
> > now, but can be implemented using simple SQL INSERT statements.
> >
> > Advantages of usage Java Thin Client in Spark integration(they all know=
n
> > from Java Thin Client advantages):
> >         1. Easy to configure: only IP addresses of server nodes are
> > required.
> >         2. Easy to deploy: only 1 additional jar required. No server
> > side(Ignite worker) configuration required.
> >
> > I propose to implement Spark Data Frame integration through Java Thin
> > Client.
> >
> > Thoughts?
> >
> > [1] https://apacheignite-fs.readme.io/docs/installation-deployment
> > [2]
> >
> https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-d=
ataframe-options
> >
>

--00000000000095b6b60578b31f46--