From dev-return-40779-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Sun Oct 21 03:33:38 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5E367180668 for ; Sun, 21 Oct 2018 03:33:38 +0200 (CEST) Received: (qmail 91668 invoked by uid 500); 21 Oct 2018 01:33:37 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 91655 invoked by uid 99); 21 Oct 2018 01:33:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Oct 2018 01:33:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0F9DCC7FCC for ; Sun, 21 Oct 2018 01:33:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.897 X-Spam-Level: * X-Spam-Status: No, score=1.897 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id vdNcqg4Aa2wZ for ; Sun, 21 Oct 2018 01:33:35 +0000 (UTC) Received: from mail-it1-f171.google.com (mail-it1-f171.google.com [209.85.166.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DCB045F3EA for ; Sun, 21 Oct 2018 01:33:34 +0000 (UTC) Received: by mail-it1-f171.google.com with SMTP id p64-v6so8683957itp.0 for ; Sat, 20 Oct 2018 18:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=5V1gLkc7Lef9V6LuP6nU5RJUsZ7pmaCd1OInhsxpHWc=; b=g++3rxhfQGiG236ZZ/PU039LSq7ibYMgkXeJQrQRWab6DN4akJrPcRY0U5LHJDK5Ek FJPhdlDUMKayegI0I6qMgpvPzkgVl9Emm3wQLkCQbtKWUg5+YfmjVr2v9iQvGpRF3REg 45ClmBz4FvUzReCFk7jnmu1uJZnu2ue9Y4jbgpcCeLIv9xVQ+ej7StalpfxJlJ70gD8E q/TH6Cv0pLs7+Pf+paF/Jhd0JieFadeg3rScj2RokYFLenoOiLzAD2XEA5BvsiV7v/aB zJ9Q5VUR7gTTlRu/qAwwxBj7zYUESHjS947mnfC5hChtQFvZdligwdfFEj0R91dc55Zh UoOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=5V1gLkc7Lef9V6LuP6nU5RJUsZ7pmaCd1OInhsxpHWc=; b=jcWT0FWHw7sjv8HT41qa5qQDpRz9uwT0cHzkI4GiZFnctFs6TgH5O410zTcx74EWWo dYphqjIWVCDOTwLyhPtXYdJZxflfOleze66xsH5vZQW6n1BnYSJIDzXeAFWjvFiOVbkB 1Fi5PEN4G+EzX3OyqJtZouJy1kUgkljs6xNv6kJhFobixFsTjdU8oFc7y+1Fqzw1mMXP pcWLzVgOnbUZI87dZ50FxFKJ4/7FJ0FyZSVcWmZezh+3EoPWxSvlHVNQ2FfcXe6kc5X0 iArykEF7G3T1EFt0+6zIUybXsQ8z/FnAMbGN8KTYp2Ff3l34nC2XaMWnCz/nUCIsgaRT EmFw== X-Gm-Message-State: ABuFfohZ9ZTDfQX4aYksPrmbBtNFnzfxX6Z+eI2tG6yutKgXVDqSRLRn r9CixYdvRY5JInjl+fbV9t/r0Pd6NJ3hc0z7GnYDvw== X-Google-Smtp-Source: ACcGV62UTeDIn4SghwE0LjIL3V8Dm3pBn8bfCtFgZXvBgU73FNfKalgXBNiERPXfQIBNUz2gIrkjJUtO2rvUVi7ioxY= X-Received: by 2002:a24:a0c:: with SMTP id 12-v6mr5325071itw.145.1540085607936; Sat, 20 Oct 2018 18:33:27 -0700 (PDT) MIME-Version: 1.0 References: <20cc20cec397bf94918644310df198df8dcebdbc.camel@gmail.com> In-Reply-To: From: Valentin Kulichenko Date: Sat, 20 Oct 2018 18:33:17 -0700 Message-ID: Subject: Re: [DISCUSSION] Spark Data Frame through Thin Client To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="00000000000095b6b60578b31f46" --00000000000095b6b60578b31f46 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Guys, From my experience, Ignite and Spark clusters typically run in the same environment, which makes client node a more preferable option. Mainly, because of performance. BTW, I doubt partition-awareness on thin client will help either, because in dataframes we only run SQL queries and I believe thin client will execute them through a proxy anyway. But correct me if I=E2=80=99m wrong. Either way, it sounds like we just have usability issues with Ignite/Spark integration. Why don=E2=80=99t we concentrate on fixing them then? For exam= ple, #3 can be fixed by loading XML content on master and then distributing it to workers, instead of loading on every worker independently. Then there are certain procedures like deploying JARs, etc. First of all, they will exist with thin client either. Second of all, I=E2=80=99m sure there are ways to = simplify this procedures and make integration easier. My opinion is that working on such improvements is going to add more value than another implementation based on thin client. -Val On Sat, Oct 20, 2018 at 4:03 PM Denis Magda wrote: > Hello Nikolay, > > Your proposal sounds reasonable. However, I would suggest us to wait whil= e > partition-awareness is supported for Java thin client first. With that > feature, the client can connect to any node directly while presently all > the communication goes through a proxy (a node the client is connected to= ). > All of that is bad for performance. > > > Vladimir, how hard would it be to support the partition-awareness for Jav= a > client? Probably, Nikolay can take over. > > -- > Denis > > > On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov > wrote: > > > Hello, Igniters. > > > > Currently, Spark Data Frame integration implemented via client node > > connection. > > Whenever we need to retrieve some data into Spark worker(or master) fro= m > > Ignite we start a client node. > > > > It has several major disadvantages: > > > > 1. We should copy whole Ignite distribution on to each Spark > > worker [1] > > 2. We should copy whole Ignite distribution on to Spark master = to > > get catalogue works. > > 3. We should have the same absolute path to Ignite configuratio= n > > file on every worker and provide it during data frame construction [2] > > 4. We should additionally configure Spark workerks classpath to > > include Ignite libraries. > > > > For now, almost all operation we need to do in Spark Data Frame > > integration is supported by Java Thin Client. > > * obtain the list of caches. > > * get cache configuration. > > * execute SQL query. > > * stream data to the table - don't support by the thin client f= or > > now, but can be implemented using simple SQL INSERT statements. > > > > Advantages of usage Java Thin Client in Spark integration(they all know= n > > from Java Thin Client advantages): > > 1. Easy to configure: only IP addresses of server nodes are > > required. > > 2. Easy to deploy: only 1 additional jar required. No server > > side(Ignite worker) configuration required. > > > > I propose to implement Spark Data Frame integration through Java Thin > > Client. > > > > Thoughts? > > > > [1] https://apacheignite-fs.readme.io/docs/installation-deployment > > [2] > > > https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-d= ataframe-options > > > --00000000000095b6b60578b31f46--