Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3D794200BCE for ; Fri, 2 Dec 2016 20:47:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3C2BE160B24; Fri, 2 Dec 2016 19:47:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 39F3D160B08 for ; Fri, 2 Dec 2016 20:47:06 +0100 (CET) Received: (qmail 8733 invoked by uid 500); 2 Dec 2016 19:47:05 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 8717 invoked by uid 99); 2 Dec 2016 19:47:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2016 19:47:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A753DC03A5 for ; Fri, 2 Dec 2016 19:47:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.679 X-Spam-Level: * X-Spam-Status: No, score=1.679 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id kZ2ql-whBzLK for ; Fri, 2 Dec 2016 19:47:00 +0000 (UTC) Received: from mail-qt0-f179.google.com (mail-qt0-f179.google.com [209.85.216.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 9EF7E5FAE0 for ; Fri, 2 Dec 2016 19:46:59 +0000 (UTC) Received: by mail-qt0-f179.google.com with SMTP id w33so261671355qtc.3 for ; Fri, 02 Dec 2016 11:46:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=33hUlfbzAjpq1IsV+eonS0VrNaZeLqANcZVn47/Izdo=; b=SGLlzivFlg0Ge4vf7qKq/YnmU9HDZMlra/XQKiIxUCu/l21TPBv3d3dLC/TEXEvKt1 W2oeFuf4n6AqwXm9mwHiFcsD024wAz4Bm5zT2ZgGJdtrN4NqvO+0h+0JQUYouZ9PwfPw WPCzeGZ05tZ0G4G0Um7LGo3RvgpKwabSoZ8cMoPAk1rXT67mMBgTrDX7flJm+5xUTJ+t iHyF2ZdMvlLgcmZbwTv7pq4EDxrwwoZAXuapBzYMGo44ol2cCSUnAfxIt+XqEdj/QTmZ iMMhvIEMtM5x3WyUGZ/bC5F8mktYmQN7r2wJTLeD+ZGlDVGkXpwLUns42cGypU0x8CFm qU2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=33hUlfbzAjpq1IsV+eonS0VrNaZeLqANcZVn47/Izdo=; b=OiulXweOg/1yCjB5jrDz7+Ld89JnRdZwum25CO86ZTpmKVU5RQMOZA+ZkTDEJcS8Pq N9JYIoH5Ci6XDldc3R/Oyhkt/AwOZCalaBLQr+ozJ8k16o+tz0CMTWxbfFJ9KLkhw6gP UbBsNFZp7rCQMCLwVlf5+wwv7IpiymAKD6/0hhtynQaD95h6cm3Ur7ucSjEnqyrXiQWy GDGGY327To3IocNLE1bBqHf/Yt86sVlT4N4MB8Oez+TZatsROyQ2ASK2zI877TayNPuB ECZwGAlylPc923dTFzAl/XlxEk6OPPOVUHZFFyQk0r9h+J446FC1+pNE3B/kJ9KsKVp7 PcmA== X-Gm-Message-State: AKaTC012VkX7P3S4wFF0cPsMP1LxNmzFzvdFP/+sf6LHvTMG2SEB5K9bH/qnPRoQ7ShyDboy4nENTn/94O8zMw== X-Received: by 10.200.55.213 with SMTP id e21mr45729709qtc.106.1480708013383; Fri, 02 Dec 2016 11:46:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.44.125 with HTTP; Fri, 2 Dec 2016 11:46:22 -0800 (PST) In-Reply-To: References: <9E40FB60-4D04-4C10-8EC4-54C5DF63CF37@apache.org> <20161121201947.GP15754@boudnik.org> <5D3D807C-9C9B-427F-B4F3-8D10586591B6@apache.org> <20161126211444.GA15754@boudnik.org> <0C13F893-0136-4875-809D-2E834AE6AC03@apache.org> <0FF804CA-7E1C-4604-93B5-B5BF09396E54@apache.org> From: Valentin Kulichenko Date: Fri, 2 Dec 2016 11:46:22 -0800 Message-ID: Subject: Re: ignite-spark module in Hadoop Accelerator To: dev@ignite.apache.org Cc: Vladimir Ozerov Content-Type: multipart/alternative; boundary=001a113e939c276c130542b2338c archived-at: Fri, 02 Dec 2016 19:47:07 -0000 --001a113e939c276c130542b2338c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In general, I don't quite understand why we should move any component outside of Fabric. The concept of Fabric is to have everything, no? :) In other words, if a cluster was once setup for Hadoop Acceleration, why not allow to create a cache and/or run a task using native Ignite APIs sometime later. We follow this approach with all our components and modules, but not with ignite-hadoop for some reason. If we get rid of Hadoop Accelerator build, initial setup of Hadoop integration can potentially become a bit more complicated, but with proper documentation I don't think this is going to be a problem, because it requires multiple steps now anyway. And frankly the same can be said about any optional module we have - enabling it requires some additional steps as it doesn't work out of the box. -Val On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda wrote: > Dmitriy, > > > - the "lib/" folder has much fewer libraries that in fabric, simply > > becomes many dependencies don't make sense for hadoop environment > > This reason why the discussion moved to this direction is exactly in that= . > > How do we decide what should be a part of Hadoop Accelerator and what > should be excluded? If you read through Val and Cos comments below you=E2= =80=99ll > get more insights. > > In general, we need to have a clear understanding on what's Hadoop > Accelerator distribution use case. This will help us to come up with a > final decision. > > If the accelerator is supposed to be plugged-in into an existed Hadoop > environment by enabling MapReduce and/IGFS at the configuration level the= n > we should simply remove ignite-indexing, ignite-spark modules and add > additional logging libs as well as AWS, GCE integrations=E2=80=99 package= s. > > But, wait, what if a user wants to leverage from Ignite Spark Integration= , > Ignite SQL or Geospatial queries, Ignite streaming capabilities after he > has already plugged-in the accelerator. What if he is ready to modify his > existed code. He can=E2=80=99t simply switch to the fabric on an applicat= ion side > because the fabric doesn=E2=80=99t include accelerator=E2=80=99s libs tha= t are still > needed. He can=E2=80=99t solely rely on the accelerator distribution as w= ell which > misses some libs. And, obviously, the user starts shuffling libs in betwe= en > the fabric and accelerator to get what is required. > > Vladimir, can you share your thoughts on this? > > =E2=80=94 > Denis > > > > > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan > wrote: > > > > Guys, > > > > I just downloaded the hadoop accelerator and here are the differences > from > > the fabric edition that jump at me right away: > > > > - the "bin/" folder has "setup-hadoop" scripts > > - the "config/" folder has "hadoop" subfolder with necessary > > hadoop-related configuration > > - the "lib/" folder has much fewer libraries that in fabric, simply > > becomes many dependencies don't make sense for hadoop environment > > > > I currently don't see how we can merge the hadoop accelerator with > standard > > fabric edition. > > > > D. > > > > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda wrote: > > > >> Vovan, > >> > >> As one of hadoop maintainers, please share your point of view on this. > >> > >> =E2=80=94 > >> Denis > >> > >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov > >> wrote: > >>> > >>> Denis > >>> > >>> I agree that at the moment there's no reason to split into fabric and > >>> hadoop editions. > >>> > >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda wrote= : > >>> > >>>> Hadoop Accelerator doesn=E2=80=99t require any additional libraries = in compare > >> to > >>>> those we have in the fabric build. It only lacks some of them as Val > >>>> mentioned below. > >>>> > >>>> Wouldn=E2=80=99t it better to discontinue Hadoop Accelerator edition= and > simply > >>>> deliver hadoop jar and its configs as a part of the fabric? > >>>> > >>>> =E2=80=94 > >>>> Denis > >>>> > >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan < > dsetrakyan@apache.org> > >>>> wrote: > >>>>> > >>>>> Separate edition for the Hadoop Accelerator was primarily driven by > the > >>>>> default libraries. Hadoop Accelerator requires many more libraries = as > >>>> well > >>>>> as configuration settings compared to the standard fabric download. > >>>>> > >>>>> Now, as far as spark integration is concerned, I am not sure which > >>>> edition > >>>>> it belongs in, Hadoop Accelerator or standard fabric. > >>>>> > >>>>> D. > >>>>> > >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda > >> wrote: > >>>>> > >>>>>> *Dmitriy*, > >>>>>> > >>>>>> I do believe that you should know why the community decided to a > >>>> separate > >>>>>> edition for the Hadoop Accelerator. What was the reason for that? > >>>>>> Presently, as I see, it brings more confusion and difficulties > rather > >>>> then > >>>>>> benefit. > >>>>>> > >>>>>> =E2=80=94 > >>>>>> Denis > >>>>>> > >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik > >> wrote: > >>>>>> > >>>>>> In fact I am very much agree with you. Right now, running the > >>>> "accelerator" > >>>>>> component in Bigtop disto gives one a pretty much complete fabric > >>>> anyway. > >>>>>> But > >>>>>> in order to make just an accelerator component we perform quite a > bit > >> of > >>>>>> woodoo magic during the packaging stage of the Bigtop build, > shuffling > >>>> jars > >>>>>> from here and there. And that's quite crazy, honestly ;) > >>>>>> > >>>>>> Cos > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote: > >>>>>> > >>>>>> I tend to agree with Denis. I see only these differences between > >> Hadoop > >>>>>> Accelerator and Fabric builds (correct me if I miss something): > >>>>>> > >>>>>> - Limited set of available modules and no optional modules in Hado= op > >>>>>> Accelerator. > >>>>>> - No ignite-hadoop module in Fabric. > >>>>>> - Additional scripts, configs and instructions included in Hadoop > >>>>>> Accelerator. > >>>>>> > >>>>>> And the list of included modules frankly looks very weird. Here ar= e > >> only > >>>>>> some of the issues I noticed: > >>>>>> > >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need > them > >>>>>> for Hadoop Acceleration (which I doubt), are they really required = or > >>>> can > >>>>>> be > >>>>>> optional? > >>>>>> - We force to use ignite-log4j module without providing other logg= er > >>>>>> options (e.g., SLF). > >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerato= r > >>>> with > >>>>>> S3 discovery? > >>>>>> - Etc. > >>>>>> > >>>>>> It seems to me that if we try to fix all this issue, there will be > >>>>>> virtually no difference between Fabric and Hadoop Accelerator buil= ds > >>>> except > >>>>>> couple of scripts and config files. If so, there is no reason to > have > >>>> two > >>>>>> builds. > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda > >> wrote: > >>>>>> > >>>>>> On the separate note, in the Bigtop, we start looking into changin= g > >> the > >>>>>> > >>>>>> way we > >>>>>> > >>>>>> deliver Ignite and we'll likely to start offering the whole 'data > >>>> fabric' > >>>>>> experience instead of the mere "hadoop-acceleration=E2=80=9D. > >>>>>> > >>>>>> > >>>>>> And you still will be using hadoop-accelerator libs of Ignite, > right? > >>>>>> > >>>>>> I=E2=80=99m thinking of if there is a need to keep releasing Hadoo= p > >> Accelerator > >>>> as > >>>>>> a separate delivery. > >>>>>> What if we start releasing the accelerator as a part of the standa= rd > >>>>>> fabric binary putting hadoop-accelerator libs under =E2=80=98optio= nal=E2=80=99 > folder? > >>>>>> > >>>>>> =E2=80=94 > >>>>>> Denis > >>>>>> > >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik > >>>> wrote: > >>>>>> > >>>>>> What Denis said: spark has been added to the Hadoop accelerator as= a > >> way > >>>>>> > >>>>>> to > >>>>>> > >>>>>> boost the performance of more than just MR compute of the Hadoop > >> stack, > >>>>>> > >>>>>> IIRC. > >>>>>> > >>>>>> For what it worth, Spark is considered a part of Hadoop at large. > >>>>>> > >>>>>> On the separate note, in the Bigtop, we start looking into changin= g > >> the > >>>>>> > >>>>>> way we > >>>>>> > >>>>>> deliver Ignite and we'll likely to start offering the whole 'data > >>>> fabric' > >>>>>> experience instead of the mere "hadoop-acceleration". > >>>>>> > >>>>>> Cos > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote: > >>>>>> > >>>>>> Val, > >>>>>> > >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator > but > >>>>>> > >>>>>> Ignite > >>>>>> > >>>>>> Hadoop File System component as well. The latter can be used in > >>>>>> > >>>>>> deployments > >>>>>> > >>>>>> like HDFS+IGFS+Ignite Spark + Spark. > >>>>>> > >>>>>> Considering this I=E2=80=99m for the second solution proposed by y= ou: put > both > >>>>>> > >>>>>> 2.10 > >>>>>> > >>>>>> and 2.11 ignite-spark modules under =E2=80=98optional=E2=80=99 fol= der of Ignite > Hadoop > >>>>>> Accelerator distribution. > >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 < > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254> > >>>>>> > >>>>>> > >>>>>> BTW, this task may be affected or related to the following ones: > >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 < > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596> > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822 > >>>>>> > >>>>>> =E2=80=94 > >>>>>> Denis > >>>>>> > >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko < > >>>>>> > >>>>>> valentin.kulichenko@gmail.com> wrote: > >>>>>> > >>>>>> > >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used b= y > >>>>>> > >>>>>> Hadoop > >>>>>> > >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD > >>>>>> > >>>>>> which > >>>>>> > >>>>>> Hadoop obviously will never use. > >>>>>> > >>>>>> Is there another use case for Hadoop Accelerator which I'm missing= ? > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan < > >>>>>> > >>>>>> dsetrakyan@apache.org> > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Why do you think that spark module is not needed in our hadoop > build? > >>>>>> > >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko < > >>>>>> valentin.kulichenko@gmail.com> wrote: > >>>>>> > >>>>>> Folks, > >>>>>> > >>>>>> Is there anyone who understands the purpose of including > ignite-spark > >>>>>> module in the Hadoop Accelerator build? I can't figure out a use > >>>>>> > >>>>>> case for > >>>>>> > >>>>>> which it's needed. > >>>>>> > >>>>>> In case we actually need it there, there is an issue then. We > >>>>>> > >>>>>> actually > >>>>>> > >>>>>> have > >>>>>> > >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build > >>>>>> > >>>>>> everything > >>>>>> > >>>>>> is > >>>>>> > >>>>>> good, we put both in 'optional' folder and user can enable either > >>>>>> > >>>>>> one. > >>>>>> > >>>>>> But > >>>>>> > >>>>>> in Hadoop Accelerator there is only 2.11 which means that the buil= d > >>>>>> > >>>>>> doesn't > >>>>>> > >>>>>> work with 2.10 out of the box. > >>>>>> > >>>>>> We should either remove the module from the build, or fix the issu= e. > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Sergey Kozlov > >>> GridGain Systems > >>> www.gridgain.com > >> > >> > > --001a113e939c276c130542b2338c--