ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: ignite-spark module in Hadoop Accelerator
Date Fri, 02 Dec 2016 19:38:28 GMT
Dmitriy,

>   - the "lib/" folder has much fewer libraries that in fabric, simply
>   becomes many dependencies don't make sense for hadoop environment

This reason why the discussion moved to this direction is exactly in that.

How do we decide what should be a part of Hadoop Accelerator and what should be excluded?
If you read through Val and Cos comments below you’ll get more insights.

In general, we need to have a clear understanding on what's Hadoop Accelerator distribution
use case. This will help us to come up with a final decision. 

If the accelerator is supposed to be plugged-in into an existed Hadoop environment by enabling
MapReduce and/IGFS at the configuration level then we should simply remove ignite-indexing,
ignite-spark modules and add additional logging libs as well as AWS, GCE integrations’ packages.

But, wait, what if a user wants to leverage from Ignite Spark Integration, Ignite SQL or Geospatial
queries, Ignite streaming capabilities after he has already plugged-in the accelerator. What
if he is ready to modify his existed code. He can’t simply switch to the fabric on an application
side because the fabric doesn’t include accelerator’s libs that are still needed. He can’t
solely rely on the accelerator distribution as well which misses some libs. And, obviously,
the user starts shuffling libs in between the fabric and accelerator to get what is required.

Vladimir, can you share your thoughts on this?

—
Denis  



> On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <dsetrakyan@apache.org> wrote:
> 
> Guys,
> 
> I just downloaded the hadoop accelerator and here are the differences from
> the fabric edition that jump at me right away:
> 
>   - the "bin/" folder has "setup-hadoop" scripts
>   - the "config/" folder has "hadoop" subfolder with necessary
>   hadoop-related configuration
>   - the "lib/" folder has much fewer libraries that in fabric, simply
>   becomes many dependencies don't make sense for hadoop environment
> 
> I currently don't see how we can merge the hadoop accelerator with standard
> fabric edition.
> 
> D.
> 
> On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dmagda@apache.org> wrote:
> 
>> Vovan,
>> 
>> As one of hadoop maintainers, please share your point of view on this.
>> 
>> —
>> Denis
>> 
>>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <skozlov@gridgain.com>
>> wrote:
>>> 
>>> Denis
>>> 
>>> I agree that at the moment there's no reason to split into fabric and
>>> hadoop editions.
>>> 
>>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dmagda@apache.org> wrote:
>>> 
>>>> Hadoop Accelerator doesn’t require any additional libraries in compare
>> to
>>>> those we have in the fabric build. It only lacks some of them as Val
>>>> mentioned below.
>>>> 
>>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
>>>> deliver hadoop jar and its configs as a part of the fabric?
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
>>>> wrote:
>>>>> 
>>>>> Separate edition for the Hadoop Accelerator was primarily driven by the
>>>>> default libraries. Hadoop Accelerator requires many more libraries as
>>>> well
>>>>> as configuration settings compared to the standard fabric download.
>>>>> 
>>>>> Now, as far as spark integration is concerned, I am not sure which
>>>> edition
>>>>> it belongs in, Hadoop Accelerator or standard fabric.
>>>>> 
>>>>> D.
>>>>> 
>>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dmagda@apache.org>
>> wrote:
>>>>> 
>>>>>> *Dmitriy*,
>>>>>> 
>>>>>> I do believe that you should know why the community decided to a
>>>> separate
>>>>>> edition for the Hadoop Accelerator. What was the reason for that?
>>>>>> Presently, as I see, it brings more confusion and difficulties rather
>>>> then
>>>>>> benefit.
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <cos@apache.org>
>> wrote:
>>>>>> 
>>>>>> In fact I am very much agree with you. Right now, running the
>>>> "accelerator"
>>>>>> component in Bigtop disto gives one a pretty much complete fabric
>>>> anyway.
>>>>>> But
>>>>>> in order to make just an accelerator component we perform quite a
bit
>> of
>>>>>> woodoo magic during the packaging stage of the Bigtop build, shuffling
>>>> jars
>>>>>> from here and there. And that's quite crazy, honestly ;)
>>>>>> 
>>>>>> Cos
>>>>>> 
>>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>>>>>> 
>>>>>> I tend to agree with Denis. I see only these differences between
>> Hadoop
>>>>>> Accelerator and Fabric builds (correct me if I miss something):
>>>>>> 
>>>>>> - Limited set of available modules and no optional modules in Hadoop
>>>>>> Accelerator.
>>>>>> - No ignite-hadoop module in Fabric.
>>>>>> - Additional scripts, configs and instructions included in Hadoop
>>>>>> Accelerator.
>>>>>> 
>>>>>> And the list of included modules frankly looks very weird. Here are
>> only
>>>>>> some of the issues I noticed:
>>>>>> 
>>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
them
>>>>>> for Hadoop Acceleration (which I doubt), are they really required
or
>>>> can
>>>>>> be
>>>>>> optional?
>>>>>> - We force to use ignite-log4j module without providing other logger
>>>>>> options (e.g., SLF).
>>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
>>>> with
>>>>>> S3 discovery?
>>>>>> - Etc.
>>>>>> 
>>>>>> It seems to me that if we try to fix all this issue, there will be
>>>>>> virtually no difference between Fabric and Hadoop Accelerator builds
>>>> except
>>>>>> couple of scripts and config files. If so, there is no reason to
have
>>>> two
>>>>>> builds.
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dmagda@apache.org>
>> wrote:
>>>>>> 
>>>>>> On the separate note, in the Bigtop, we start looking into changing
>> the
>>>>>> 
>>>>>> way we
>>>>>> 
>>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>>>> fabric'
>>>>>> experience instead of the mere "hadoop-acceleration”.
>>>>>> 
>>>>>> 
>>>>>> And you still will be using hadoop-accelerator libs of Ignite, right?
>>>>>> 
>>>>>> I’m thinking of if there is a need to keep releasing Hadoop
>> Accelerator
>>>> as
>>>>>> a separate delivery.
>>>>>> What if we start releasing the accelerator as a part of the standard
>>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
folder?
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <cos@apache.org>
>>>> wrote:
>>>>>> 
>>>>>> What Denis said: spark has been added to the Hadoop accelerator as
a
>> way
>>>>>> 
>>>>>> to
>>>>>> 
>>>>>> boost the performance of more than just MR compute of the Hadoop
>> stack,
>>>>>> 
>>>>>> IIRC.
>>>>>> 
>>>>>> For what it worth, Spark is considered a part of Hadoop at large.
>>>>>> 
>>>>>> On the separate note, in the Bigtop, we start looking into changing
>> the
>>>>>> 
>>>>>> way we
>>>>>> 
>>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>>>> fabric'
>>>>>> experience instead of the mere "hadoop-acceleration".
>>>>>> 
>>>>>> Cos
>>>>>> 
>>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>>>>>> 
>>>>>> Val,
>>>>>> 
>>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
but
>>>>>> 
>>>>>> Ignite
>>>>>> 
>>>>>> Hadoop File System component as well. The latter can be used in
>>>>>> 
>>>>>> deployments
>>>>>> 
>>>>>> like HDFS+IGFS+Ignite Spark + Spark.
>>>>>> 
>>>>>> Considering this I’m for the second solution proposed by you: put
both
>>>>>> 
>>>>>> 2.10
>>>>>> 
>>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
Hadoop
>>>>>> Accelerator distribution.
>>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>>>>>> 
>>>>>> 
>>>>>> BTW, this task may be affected or related to the following ones:
>>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>>>>>> 
>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used
by
>>>>>> 
>>>>>> Hadoop
>>>>>> 
>>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>>>>>> 
>>>>>> which
>>>>>> 
>>>>>> Hadoop obviously will never use.
>>>>>> 
>>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>>>>>> 
>>>>>> dsetrakyan@apache.org>
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> Why do you think that spark module is not needed in our hadoop build?
>>>>>> 
>>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>> 
>>>>>> Folks,
>>>>>> 
>>>>>> Is there anyone who understands the purpose of including ignite-spark
>>>>>> module in the Hadoop Accelerator build? I can't figure out a use
>>>>>> 
>>>>>> case for
>>>>>> 
>>>>>> which it's needed.
>>>>>> 
>>>>>> In case we actually need it there, there is an issue then. We
>>>>>> 
>>>>>> actually
>>>>>> 
>>>>>> have
>>>>>> 
>>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>>>>>> 
>>>>>> everything
>>>>>> 
>>>>>> is
>>>>>> 
>>>>>> good, we put both in 'optional' folder and user can enable either
>>>>>> 
>>>>>> one.
>>>>>> 
>>>>>> But
>>>>>> 
>>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>>>> 
>>>>>> doesn't
>>>>>> 
>>>>>> work with 2.10 out of the box.
>>>>>> 
>>>>>> We should either remove the module from the build, or fix the issue.
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Sergey Kozlov
>>> GridGain Systems
>>> www.gridgain.com
>> 
>> 


Mime
View raw message