ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Kozlov <skoz...@gridgain.com>
Subject Re: ignite-spark module in Hadoop Accelerator
Date Thu, 08 Dec 2016 09:55:38 GMT
Another point is that hadoop edition has no optional modules. It forces
user to download the fabric edition and copy module from there.

On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <vozerov@gridgain.com>
wrote:

> Work for ourselves - is to maintain two separate editions, while everything
> can be easily merged into a single distribution.
>
> On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <dsetrakyan@apache.org>
> wrote:
>
> > Why are we creating work for ourselves? What is wrong with having 2
> > downloads?
> >
> > Hadoop accelerator edition exists for the following 2 purposes only:
> >
> >    - accelerate HDFS with Ignite In-Memory File System (IGFS)
> >    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
> >
> > I agree with the original email from Valentin that Spark libs should not
> be
> > included into hadoop-accelerator download. Spark integration is not part
> of
> > Ignite Hadoop Accelerator and should be included only into the Ignite
> > fabric download.
> >
> > D.
> >
> >
> >
> > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <skozlov@gridgain.com>
> > wrote:
> >
> > > Hi
> > >
> > > In general I agree with Vladimir but would suggest more technical
> > details:
> > >
> > > Due the need to collect particular CLASS_PATHs for fabric and hadoop
> > > editions we can change the logic of processing of libs directory
> > >
> > > 1. Introduce libs/hadoop and libs/fabric directories. These directories
> > are
> > > root directories for specific modules for hadoop and fabric
> > > editions respectively
> > > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> > >  - collect everything for libs except libs/hadoop
> > >  - collect everything from libs/fabric
> > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make
> > initial
> > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by
> following
> > > way:
> > >  - collect everything for libs except libs/fabirc
> > >  - collect everything from libs/hadoop
> > >
> > > This approach allows us following:
> > >  - share common modules across both editions (just put in libs)
> > >  - do not share edition-specific modules (either put in libs/hadoop or
> in
> > > libs/fabric)
> > >
> > >
> > >
> > >
> > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > >
> > > > Agree. I do not see any reasons to have two different products.
> > Instead,
> > > > just add ignite-hadoop.jar to distribution, and add separate script
> to
> > > > start Accelerator. We can go the same way as we did for "platforms":
> > > create
> > > > separate top-level folder "hadoop" in Fabric distribution and put all
> > > > realted Hadoop Acceleratro stuff there.
> > > >
> > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > > valentin.kulichenko@gmail.com> wrote:
> > > >
> > > > > In general, I don't quite understand why we should move any
> component
> > > > > outside of Fabric. The concept of Fabric is to have everything, no?
> > :)
> > > In
> > > > > other words, if a cluster was once setup for Hadoop Acceleration,
> why
> > > not
> > > > > allow to create a cache and/or run a task using native Ignite APIs
> > > > sometime
> > > > > later. We follow this approach with all our components and modules,
> > but
> > > > not
> > > > > with ignite-hadoop for some reason.
> > > > >
> > > > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > > > > integration can potentially become a bit more complicated, but with
> > > > proper
> > > > > documentation I don't think this is going to be a problem, because
> it
> > > > > requires multiple steps now anyway. And frankly the same can be
> said
> > > > about
> > > > > any optional module we have - enabling it requires some additional
> > > steps
> > > > as
> > > > > it doesn't work out of the box.
> > > > >
> > > > > -Val
> > > > >
> > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dmagda@apache.org>
> > > wrote:
> > > > >
> > > > >> Dmitriy,
> > > > >>
> > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > simply
> > > > >> >   becomes many dependencies don't make sense for hadoop
> > environment
> > > > >>
> > > > >> This reason why the discussion moved to this direction is exactly
> in
> > > > that.
> > > > >>
> > > > >> How do we decide what should be a part of Hadoop Accelerator
and
> > what
> > > > >> should be excluded? If you read through Val and Cos comments
below
> > > > you’ll
> > > > >> get more insights.
> > > > >>
> > > > >> In general, we need to have a clear understanding on what's Hadoop
> > > > >> Accelerator distribution use case. This will help us to come
up
> > with a
> > > > >> final decision.
> > > > >>
> > > > >> If the accelerator is supposed to be plugged-in into an existed
> > Hadoop
> > > > >> environment by enabling MapReduce and/IGFS at the configuration
> > level
> > > > then
> > > > >> we should simply remove ignite-indexing, ignite-spark modules
and
> > add
> > > > >> additional logging libs as well as AWS, GCE integrations’
> packages.
> > > > >>
> > > > >> But, wait, what if a user wants to leverage from Ignite Spark
> > > > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> > > > >> capabilities after he has already plugged-in the accelerator.
What
> > if
> > > > he is
> > > > >> ready to modify his existed code. He can’t simply switch to
the
> > fabric
> > > > on
> > > > >> an application side because the fabric doesn’t include
> accelerator’s
> > > > libs
> > > > >> that are still needed. He can’t solely rely on the accelerator
> > > > distribution
> > > > >> as well which misses some libs. And, obviously, the user starts
> > > > shuffling
> > > > >> libs in between the fabric and accelerator to get what is
> required.
> > > > >>
> > > > >> Vladimir, can you share your thoughts on this?
> > > > >>
> > > > >> —
> > > > >> Denis
> > > > >>
> > > > >>
> > > > >>
> > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > > dsetrakyan@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> > Guys,
> > > > >> >
> > > > >> > I just downloaded the hadoop accelerator and here are the
> > > differences
> > > > >> from
> > > > >> > the fabric edition that jump at me right away:
> > > > >> >
> > > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > > >> >   - the "config/" folder has "hadoop" subfolder with necessary
> > > > >> >   hadoop-related configuration
> > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > simply
> > > > >> >   becomes many dependencies don't make sense for hadoop
> > environment
> > > > >> >
> > > > >> > I currently don't see how we can merge the hadoop accelerator
> with
> > > > >> standard
> > > > >> > fabric edition.
> > > > >> >
> > > > >> > D.
> > > > >> >
> > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dmagda@apache.org>
> > > > wrote:
> > > > >> >
> > > > >> >> Vovan,
> > > > >> >>
> > > > >> >> As one of hadoop maintainers, please share your point
of view
> on
> > > > this.
> > > > >> >>
> > > > >> >> —
> > > > >> >> Denis
> > > > >> >>
> > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> > skozlov@gridgain.com
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>> Denis
> > > > >> >>>
> > > > >> >>> I agree that at the moment there's no reason to
split into
> > fabric
> > > > and
> > > > >> >>> hadoop editions.
> > > > >> >>>
> > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <
> dmagda@apache.org>
> > > > >> wrote:
> > > > >> >>>
> > > > >> >>>> Hadoop Accelerator doesn’t require any additional
libraries
> in
> > > > >> compare
> > > > >> >> to
> > > > >> >>>> those we have in the fabric build. It only lacks
some of them
> > as
> > > > Val
> > > > >> >>>> mentioned below.
> > > > >> >>>>
> > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator
edition
> > and
> > > > >> simply
> > > > >> >>>> deliver hadoop jar and its configs as a part
of the fabric?
> > > > >> >>>>
> > > > >> >>>> —
> > > > >> >>>> Denis
> > > > >> >>>>
> > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan
<
> > > > >> dsetrakyan@apache.org>
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>> Separate edition for the Hadoop Accelerator
was primarily
> > driven
> > > > by
> > > > >> the
> > > > >> >>>>> default libraries. Hadoop Accelerator requires
many more
> > > libraries
> > > > >> as
> > > > >> >>>> well
> > > > >> >>>>> as configuration settings compared to the
standard fabric
> > > > download.
> > > > >> >>>>>
> > > > >> >>>>> Now, as far as spark integration is concerned,
I am not sure
> > > which
> > > > >> >>>> edition
> > > > >> >>>>> it belongs in, Hadoop Accelerator or standard
fabric.
> > > > >> >>>>>
> > > > >> >>>>> D.
> > > > >> >>>>>
> > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda
<
> > dmagda@apache.org
> > > >
> > > > >> >> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> *Dmitriy*,
> > > > >> >>>>>>
> > > > >> >>>>>> I do believe that you should know why
the community decided
> > to
> > > a
> > > > >> >>>> separate
> > > > >> >>>>>> edition for the Hadoop Accelerator.
What was the reason for
> > > that?
> > > > >> >>>>>> Presently, as I see, it brings more
confusion and
> > difficulties
> > > > >> rather
> > > > >> >>>> then
> > > > >> >>>>>> benefit.
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin
Boudnik <
> > > cos@apache.org>
> > > > >> >> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> In fact I am very much agree with you.
Right now, running
> the
> > > > >> >>>> "accelerator"
> > > > >> >>>>>> component in Bigtop disto gives one
a pretty much complete
> > > fabric
> > > > >> >>>> anyway.
> > > > >> >>>>>> But
> > > > >> >>>>>> in order to make just an accelerator
component we perform
> > > quite a
> > > > >> bit
> > > > >> >> of
> > > > >> >>>>>> woodoo magic during the packaging stage
of the Bigtop
> build,
> > > > >> shuffling
> > > > >> >>>> jars
> > > > >> >>>>>> from here and there. And that's quite
crazy, honestly ;)
> > > > >> >>>>>>
> > > > >> >>>>>> Cos
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin
Kulichenko wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> I tend to agree with Denis. I see only
these differences
> > > between
> > > > >> >> Hadoop
> > > > >> >>>>>> Accelerator and Fabric builds (correct
me if I miss
> > something):
> > > > >> >>>>>>
> > > > >> >>>>>> - Limited set of available modules and
no optional modules
> in
> > > > >> Hadoop
> > > > >> >>>>>> Accelerator.
> > > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > > >> >>>>>> - Additional scripts, configs and instructions
included in
> > > Hadoop
> > > > >> >>>>>> Accelerator.
> > > > >> >>>>>>
> > > > >> >>>>>> And the list of included modules frankly
looks very weird.
> > Here
> > > > are
> > > > >> >> only
> > > > >> >>>>>> some of the issues I noticed:
> > > > >> >>>>>>
> > > > >> >>>>>> - ignite-indexing and ignite-spark are
mandatory. Even if
> we
> > > need
> > > > >> them
> > > > >> >>>>>> for Hadoop Acceleration (which I doubt),
are they really
> > > required
> > > > >> or
> > > > >> >>>> can
> > > > >> >>>>>> be
> > > > >> >>>>>> optional?
> > > > >> >>>>>> - We force to use ignite-log4j module
without providing
> other
> > > > >> logger
> > > > >> >>>>>> options (e.g., SLF).
> > > > >> >>>>>> - We don't include ignite-aws module.
How to use Hadoop
> > > > Accelerator
> > > > >> >>>> with
> > > > >> >>>>>> S3 discovery?
> > > > >> >>>>>> - Etc.
> > > > >> >>>>>>
> > > > >> >>>>>> It seems to me that if we try to fix
all this issue, there
> > will
> > > > be
> > > > >> >>>>>> virtually no difference between Fabric
and Hadoop
> Accelerator
> > > > >> builds
> > > > >> >>>> except
> > > > >> >>>>>> couple of scripts and config files.
If so, there is no
> reason
> > > to
> > > > >> have
> > > > >> >>>> two
> > > > >> >>>>>> builds.
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis
Magda <
> > > dmagda@apache.org>
> > > > >> >> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> On the separate note, in the Bigtop,
we start looking into
> > > > changing
> > > > >> >> the
> > > > >> >>>>>>
> > > > >> >>>>>> way we
> > > > >> >>>>>>
> > > > >> >>>>>> deliver Ignite and we'll likely to start
offering the whole
> > > 'data
> > > > >> >>>> fabric'
> > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> And you still will be using hadoop-accelerator
libs of
> > Ignite,
> > > > >> right?
> > > > >> >>>>>>
> > > > >> >>>>>> I’m thinking of if there is a need
to keep releasing Hadoop
> > > > >> >> Accelerator
> > > > >> >>>> as
> > > > >> >>>>>> a separate delivery.
> > > > >> >>>>>> What if we start releasing the accelerator
as a part of the
> > > > >> standard
> > > > >> >>>>>> fabric binary putting hadoop-accelerator
libs under
> > ‘optional’
> > > > >> folder?
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin
Boudnik <
> > > cos@apache.org
> > > > >
> > > > >> >>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> What Denis said: spark has been added
to the Hadoop
> > accelerator
> > > > as
> > > > >> a
> > > > >> >> way
> > > > >> >>>>>>
> > > > >> >>>>>> to
> > > > >> >>>>>>
> > > > >> >>>>>> boost the performance of more than just
MR compute of the
> > > Hadoop
> > > > >> >> stack,
> > > > >> >>>>>>
> > > > >> >>>>>> IIRC.
> > > > >> >>>>>>
> > > > >> >>>>>> For what it worth, Spark is considered
a part of Hadoop at
> > > large.
> > > > >> >>>>>>
> > > > >> >>>>>> On the separate note, in the Bigtop,
we start looking into
> > > > changing
> > > > >> >> the
> > > > >> >>>>>>
> > > > >> >>>>>> way we
> > > > >> >>>>>>
> > > > >> >>>>>> deliver Ignite and we'll likely to start
offering the whole
> > > 'data
> > > > >> >>>> fabric'
> > > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > > >> >>>>>>
> > > > >> >>>>>> Cos
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis
Magda wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Val,
> > > > >> >>>>>>
> > > > >> >>>>>> Ignite Hadoop module includes not only
the map-reduce
> > > accelerator
> > > > >> but
> > > > >> >>>>>>
> > > > >> >>>>>> Ignite
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop File System component as well.
The latter can be
> used
> > in
> > > > >> >>>>>>
> > > > >> >>>>>> deployments
> > > > >> >>>>>>
> > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > > >> >>>>>>
> > > > >> >>>>>> Considering this I’m for the second
solution proposed by
> you:
> > > put
> > > > >> both
> > > > >> >>>>>>
> > > > >> >>>>>> 2.10
> > > > >> >>>>>>
> > > > >> >>>>>> and 2.11 ignite-spark modules under
‘optional’ folder of
> > Ignite
> > > > >> Hadoop
> > > > >> >>>>>> Accelerator distribution.
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254
<
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> BTW, this task may be affected or related
to the following
> > > ones:
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596
<
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin
Kulichenko <
> > > > >> >>>>>>
> > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite
and this plugin is
> > > used
> > > > by
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop
> > > > >> >>>>>>
> > > > >> >>>>>> when running its jobs. ignite-spark
module only provides
> > > > IgniteRDD
> > > > >> >>>>>>
> > > > >> >>>>>> which
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop obviously will never use.
> > > > >> >>>>>>
> > > > >> >>>>>> Is there another use case for Hadoop
Accelerator which I'm
> > > > missing?
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy
Setrakyan <
> > > > >> >>>>>>
> > > > >> >>>>>> dsetrakyan@apache.org>
> > > > >> >>>>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Why do you think that spark module is
not needed in our
> > hadoop
> > > > >> build?
> > > > >> >>>>>>
> > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin
Kulichenko <
> > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Folks,
> > > > >> >>>>>>
> > > > >> >>>>>> Is there anyone who understands the
purpose of including
> > > > >> ignite-spark
> > > > >> >>>>>> module in the Hadoop Accelerator build?
I can't figure out
> a
> > > use
> > > > >> >>>>>>
> > > > >> >>>>>> case for
> > > > >> >>>>>>
> > > > >> >>>>>> which it's needed.
> > > > >> >>>>>>
> > > > >> >>>>>> In case we actually need it there, there
is an issue then.
> We
> > > > >> >>>>>>
> > > > >> >>>>>> actually
> > > > >> >>>>>>
> > > > >> >>>>>> have
> > > > >> >>>>>>
> > > > >> >>>>>> two ignite-spark modules, for 2.10 and
2.11. In Fabric
> build
> > > > >> >>>>>>
> > > > >> >>>>>> everything
> > > > >> >>>>>>
> > > > >> >>>>>> is
> > > > >> >>>>>>
> > > > >> >>>>>> good, we put both in 'optional' folder
and user can enable
> > > either
> > > > >> >>>>>>
> > > > >> >>>>>> one.
> > > > >> >>>>>>
> > > > >> >>>>>> But
> > > > >> >>>>>>
> > > > >> >>>>>> in Hadoop Accelerator there is only
2.11 which means that
> the
> > > > build
> > > > >> >>>>>>
> > > > >> >>>>>> doesn't
> > > > >> >>>>>>
> > > > >> >>>>>> work with 2.10 out of the box.
> > > > >> >>>>>>
> > > > >> >>>>>> We should either remove the module from
the build, or fix
> the
> > > > >> issue.
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> --
> > > > >> >>> Sergey Kozlov
> > > > >> >>> GridGain Systems
> > > > >> >>> www.gridgain.com
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > > > --
> > > > Vladimir Ozerov
> > > > Senior Software Architect
> > > > GridGain Systems
> > > > www.gridgain.com
> > > > *+7 (960) 283 98 40*
> > > >
> > >
> > >
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
>
> --
> Vladimir Ozerov
> Senior Software Architect
> GridGain Systems
> www.gridgain.com
> *+7 (960) 283 98 40*
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message