Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@ignite.apache.org
MIME-Version: 1.0
In-Reply-To: <F1C97A07-6EB6-422D-BC73-BA9BCB37B21F@apache.org>
References: <CABuYRcpe=nasjvhukfg6ixnLsUNuy8i0G-VTJsv6NAd=VX6cbw@mail.gmail.com>
 <CA+0=VoUndNG-3RUGkDMr6+VU7apBnhq3nqB9wzxTtDTTvc514g@mail.gmail.com>
 <CABuYRcrf6dxxSuS5BTr20eD9vGJmwb0Hd0Hv1t6ZRCK-xTQ9XQ@mail.gmail.com>
 <9E40FB60-4D04-4C10-8EC4-54C5DF63CF37@apache.org> <20161121201947.GP15754@boudnik.org>
 <5D3D807C-9C9B-427F-B4F3-8D10586591B6@apache.org> <CABuYRcoMMpmWnV-qp=N=+jxsD=M2KQ-CkyDs2=wGr7k_x6Wigw@mail.gmail.com>
 <20161126211444.GA15754@boudnik.org> <0C13F893-0136-4875-809D-2E834AE6AC03@apache.org>
 <CA+0=VoXv_3b1wDc5iZGo=zXx=8F=Jat6i2gumOzCNix_pAgT_Q@mail.gmail.com>
 <0FF804CA-7E1C-4604-93B5-B5BF09396E54@apache.org> <CAMbfdjYW+mZ7ELTS3ZFZxMz5btpDLK5-MiBknBGrM8fGSpb-WQ@mail.gmail.com>
 <F8BC862F-38D9-4263-B573-018DFCDEA3AA@apache.org> <CA+0=VoVXqKjR3d0WAtP6WJY0tdk8Ud+-mZH+=p2Vr_ust+zs6g@mail.gmail.com>
 <F1C97A07-6EB6-422D-BC73-BA9BCB37B21F@apache.org>
From: Valentin Kulichenko <valentin.kulichenko@gmail.com>
Date: Fri, 2 Dec 2016 11:46:22 -0800
Message-ID: <CABuYRcorpz_cF0V0bLO-c7_UU34gbe3uvsfmH1UDWsGrzd8NFA@mail.gmail.com>
Subject: Re: ignite-spark module in Hadoop Accelerator
To: dev@ignite.apache.org
Cc: Vladimir Ozerov <vozerov@gridgain.com>
Content-Type: multipart/alternative; boundary=001a113e939c276c130542b2338c
archived-at: Fri, 02 Dec 2016 19:47:07 -0000

--001a113e939c276c130542b2338c
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

In general, I don't quite understand why we should move any component
outside of Fabric. The concept of Fabric is to have everything, no? :) In
other words, if a cluster was once setup for Hadoop Acceleration, why not
allow to create a cache and/or run a task using native Ignite APIs sometime
later. We follow this approach with all our components and modules, but not
with ignite-hadoop for some reason.

If we get rid of Hadoop Accelerator build, initial setup of Hadoop
integration can potentially become a bit more complicated, but with proper
documentation I don't think this is going to be a problem, because it
requires multiple steps now anyway. And frankly the same can be said about
any optional module we have - enabling it requires some additional steps as
it doesn't work out of the box.

-Val

On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dmagda@apache.org> wrote:

> Dmitriy,
>
> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >   becomes many dependencies don't make sense for hadoop environment
>
> This reason why the discussion moved to this direction is exactly in that=
.
>
> How do we decide what should be a part of Hadoop Accelerator and what
> should be excluded? If you read through Val and Cos comments below you=E2=
=80=99ll
> get more insights.
>
> In general, we need to have a clear understanding on what's Hadoop
> Accelerator distribution use case. This will help us to come up with a
> final decision.
>
> If the accelerator is supposed to be plugged-in into an existed Hadoop
> environment by enabling MapReduce and/IGFS at the configuration level the=
n
> we should simply remove ignite-indexing, ignite-spark modules and add
> additional logging libs as well as AWS, GCE integrations=E2=80=99 package=
s.
>
> But, wait, what if a user wants to leverage from Ignite Spark Integration=
,
> Ignite SQL or Geospatial queries, Ignite streaming capabilities after he
> has already plugged-in the accelerator. What if he is ready to modify his
> existed code. He can=E2=80=99t simply switch to the fabric on an applicat=
ion side
> because the fabric doesn=E2=80=99t include accelerator=E2=80=99s libs tha=
t are still
> needed. He can=E2=80=99t solely rely on the accelerator distribution as w=
ell which
> misses some libs. And, obviously, the user starts shuffling libs in betwe=
en
> the fabric and accelerator to get what is required.
>
> Vladimir, can you share your thoughts on this?
>
> =E2=80=94
> Denis
>
>
>
> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
> wrote:
> >
> > Guys,
> >
> > I just downloaded the hadoop accelerator and here are the differences
> from
> > the fabric edition that jump at me right away:
> >
> >   - the "bin/" folder has "setup-hadoop" scripts
> >   - the "config/" folder has "hadoop" subfolder with necessary
> >   hadoop-related configuration
> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >   becomes many dependencies don't make sense for hadoop environment
> >
> > I currently don't see how we can merge the hadoop accelerator with
> standard
> > fabric edition.
> >
> > D.
> >
> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dmagda@apache.org> wrote:
> >
> >> Vovan,
> >>
> >> As one of hadoop maintainers, please share your point of view on this.
> >>
> >> =E2=80=94
> >> Denis
> >>
> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <skozlov@gridgain.com>
> >> wrote:
> >>>
> >>> Denis
> >>>
> >>> I agree that at the moment there's no reason to split into fabric and
> >>> hadoop editions.
> >>>
> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dmagda@apache.org> wrote=
:
> >>>
> >>>> Hadoop Accelerator doesn=E2=80=99t require any additional libraries =
in compare
> >> to
> >>>> those we have in the fabric build. It only lacks some of them as Val
> >>>> mentioned below.
> >>>>
> >>>> Wouldn=E2=80=99t it better to discontinue Hadoop Accelerator edition=
 and
> simply
> >>>> deliver hadoop jar and its configs as a part of the fabric?
> >>>>
> >>>> =E2=80=94
> >>>> Denis
> >>>>
> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> >>>> wrote:
> >>>>>
> >>>>> Separate edition for the Hadoop Accelerator was primarily driven by
> the
> >>>>> default libraries. Hadoop Accelerator requires many more libraries =
as
> >>>> well
> >>>>> as configuration settings compared to the standard fabric download.
> >>>>>
> >>>>> Now, as far as spark integration is concerned, I am not sure which
> >>>> edition
> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> >>>>>
> >>>>> D.
> >>>>>
> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dmagda@apache.org>
> >> wrote:
> >>>>>
> >>>>>> *Dmitriy*,
> >>>>>>
> >>>>>> I do believe that you should know why the community decided to a
> >>>> separate
> >>>>>> edition for the Hadoop Accelerator. What was the reason for that?
> >>>>>> Presently, as I see, it brings more confusion and difficulties
> rather
> >>>> then
> >>>>>> benefit.
> >>>>>>
> >>>>>> =E2=80=94
> >>>>>> Denis
> >>>>>>
> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <cos@apache.org>
> >> wrote:
> >>>>>>
> >>>>>> In fact I am very much agree with you. Right now, running the
> >>>> "accelerator"
> >>>>>> component in Bigtop disto gives one a pretty much complete fabric
> >>>> anyway.
> >>>>>> But
> >>>>>> in order to make just an accelerator component we perform quite a
> bit
> >> of
> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
> shuffling
> >>>> jars
> >>>>>> from here and there. And that's quite crazy, honestly ;)
> >>>>>>
> >>>>>> Cos
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >>>>>>
> >>>>>> I tend to agree with Denis. I see only these differences between
> >> Hadoop
> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
> >>>>>>
> >>>>>> - Limited set of available modules and no optional modules in Hado=
op
> >>>>>> Accelerator.
> >>>>>> - No ignite-hadoop module in Fabric.
> >>>>>> - Additional scripts, configs and instructions included in Hadoop
> >>>>>> Accelerator.
> >>>>>>
> >>>>>> And the list of included modules frankly looks very weird. Here ar=
e
> >> only
> >>>>>> some of the issues I noticed:
> >>>>>>
> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
> them
> >>>>>> for Hadoop Acceleration (which I doubt), are they really required =
or
> >>>> can
> >>>>>> be
> >>>>>> optional?
> >>>>>> - We force to use ignite-log4j module without providing other logg=
er
> >>>>>> options (e.g., SLF).
> >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerato=
r
> >>>> with
> >>>>>> S3 discovery?
> >>>>>> - Etc.
> >>>>>>
> >>>>>> It seems to me that if we try to fix all this issue, there will be
> >>>>>> virtually no difference between Fabric and Hadoop Accelerator buil=
ds
> >>>> except
> >>>>>> couple of scripts and config files. If so, there is no reason to
> have
> >>>> two
> >>>>>> builds.
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dmagda@apache.org>
> >> wrote:
> >>>>>>
> >>>>>> On the separate note, in the Bigtop, we start looking into changin=
g
> >> the
> >>>>>>
> >>>>>> way we
> >>>>>>
> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >>>> fabric'
> >>>>>> experience instead of the mere "hadoop-acceleration=E2=80=9D.
> >>>>>>
> >>>>>>
> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
> right?
> >>>>>>
> >>>>>> I=E2=80=99m thinking of if there is a need to keep releasing Hadoo=
p
> >> Accelerator
> >>>> as
> >>>>>> a separate delivery.
> >>>>>> What if we start releasing the accelerator as a part of the standa=
rd
> >>>>>> fabric binary putting hadoop-accelerator libs under =E2=80=98optio=
nal=E2=80=99
> folder?
> >>>>>>
> >>>>>> =E2=80=94
> >>>>>> Denis
> >>>>>>
> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <cos@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>> What Denis said: spark has been added to the Hadoop accelerator as=
 a
> >> way
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>> boost the performance of more than just MR compute of the Hadoop
> >> stack,
> >>>>>>
> >>>>>> IIRC.
> >>>>>>
> >>>>>> For what it worth, Spark is considered a part of Hadoop at large.
> >>>>>>
> >>>>>> On the separate note, in the Bigtop, we start looking into changin=
g
> >> the
> >>>>>>
> >>>>>> way we
> >>>>>>
> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >>>> fabric'
> >>>>>> experience instead of the mere "hadoop-acceleration".
> >>>>>>
> >>>>>> Cos
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >>>>>>
> >>>>>> Val,
> >>>>>>
> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
> but
> >>>>>>
> >>>>>> Ignite
> >>>>>>
> >>>>>> Hadoop File System component as well. The latter can be used in
> >>>>>>
> >>>>>> deployments
> >>>>>>
> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> >>>>>>
> >>>>>> Considering this I=E2=80=99m for the second solution proposed by y=
ou: put
> both
> >>>>>>
> >>>>>> 2.10
> >>>>>>
> >>>>>> and 2.11 ignite-spark modules under =E2=80=98optional=E2=80=99 fol=
der of Ignite
> Hadoop
> >>>>>> Accelerator distribution.
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> >>>>>>
> >>>>>>
> >>>>>> BTW, this task may be affected or related to the following ones:
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> >>>>>>
> >>>>>> =E2=80=94
> >>>>>> Denis
> >>>>>>
> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >>>>>>
> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used b=
y
> >>>>>>
> >>>>>> Hadoop
> >>>>>>
> >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
> >>>>>>
> >>>>>> which
> >>>>>>
> >>>>>> Hadoop obviously will never use.
> >>>>>>
> >>>>>> Is there another use case for Hadoop Accelerator which I'm missing=
?
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >>>>>>
> >>>>>> dsetrakyan@apache.org>
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Why do you think that spark module is not needed in our hadoop
> build?
> >>>>>>
> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >>>>>>
> >>>>>> Folks,
> >>>>>>
> >>>>>> Is there anyone who understands the purpose of including
> ignite-spark
> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use
> >>>>>>
> >>>>>> case for
> >>>>>>
> >>>>>> which it's needed.
> >>>>>>
> >>>>>> In case we actually need it there, there is an issue then. We
> >>>>>>
> >>>>>> actually
> >>>>>>
> >>>>>> have
> >>>>>>
> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >>>>>>
> >>>>>> everything
> >>>>>>
> >>>>>> is
> >>>>>>
> >>>>>> good, we put both in 'optional' folder and user can enable either
> >>>>>>
> >>>>>> one.
> >>>>>>
> >>>>>> But
> >>>>>>
> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the buil=
d
> >>>>>>
> >>>>>> doesn't
> >>>>>>
> >>>>>> work with 2.10 out of the box.
> >>>>>>
> >>>>>> We should either remove the module from the build, or fix the issu=
e.
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sergey Kozlov
> >>> GridGain Systems
> >>> www.gridgain.com
> >>
> >>
>
>

--001a113e939c276c130542b2338c--