flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Katherin Eri <katherinm...@gmail.com>
Subject Re: New Flink team member - Kate Eri.
Date Mon, 13 Feb 2017 13:09:15 GMT
Hello guys,



It seems that issue FLINK-1730
<https://issues.apache.org/jira/browse/FLINK-1730> significantly impacts
integration of Flink with SystemML.

They have checked several integrations, and Flink’s integration is slowest
<https://github.com/apache/incubator-systemml/pull/119#issuecomment-222059794>
:

   - MR: LinregDS: 147s (2 jobs); LinregCG w/ 6 iterations: 361s (8 jobs)
   w/ mmchain; 628s (14 jobs) w/o mmchain
   - Spark: LinregDS: 71s (3 jobs); LinregCG w/ 6 iterations: 41s (8 jobs)
   w/ mmchain; 48s (14 jobs) w/o mmchain
   - Flink: LinregDS: 212s (3 jobs); LinregCG w/ 6 iterations: 1,047s (14
   jobs) w/o mmchain

This fact is caused, as already Felix said, by two reasons:

1)      FLINK-1730 <https://issues.apache.org/jira/browse/FLINK-1730>

2)      FLINK-4175 <https://issues.apache.org/jira/browse/FLINK-4175>

As far as FLINK-1730 is not assigned to anyone we would like to take this
ticket to work (my colleges could try to implement it).

Further discussion of the topic related to FLINK-1730 I would like to
handle in appropriate ticket.


пт, 10 февр. 2017 г. в 19:57, Katherin Eri <katherinmail@gmail.com>:

> I have created the ticket to discuss GPU related questions futher
> https://issues.apache.org/jira/browse/FLINK-5782
>
> пт, 10 февр. 2017 г. в 18:16, Katherin Eri <katherinmail@gmail.com>:
>
> Thank you, Trevor!
>
> You have shared very valuable points; I will consider them.
>
> So I think, I should create finally ticket at Flink’s JIRA, at least for
> Flink's GPU support and move the related discussion there?
>
> I will contact to Suneel regarding DL4J, thanks!
>
>
> пт, 10 февр. 2017 г. в 17:44, Trevor Grant <trevor.d.grant@gmail.com>:
>
> Also RE: DL4J integration.
>
> Suneel had done some work on this a while back, and ran into issues.  You
> might want to chat with him about the pitfalls and 'gotchyas' there.
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant <trevor.d.grant@gmail.com>
> wrote:
>
> > Sorry for chiming in late.
> >
> > GPUs on Flink.  Till raised a good point- you need to be able to fall
> back
> > to non-GPU resources if they aren't available.
> >
> > Fun fact: this has already been developed for Flink vis-a-vis the Apache
> > Mahout project.
> >
> > In short- Mahout exposes a number of tensor functions (vector %*% matrix,
> > matrix %*% matrix, etc).  If compiled for GPU support, those operations
> are
> > completed via GPU- and if no GPUs are in fact available, Mahout math
> falls
> > back to CPUs (and finally back to the JVM).
> >
> > How this should work is Flink takes care of shipping data around the
> > cluster, and when data arrives at the local node- is dumped out to GPU
> for
> > calculation, loaded back up and shipped back around cluster.  In
> practice,
> > the lack of a persist method for intermediate results makes this
> > troublesome (not because of GPUs but for calculating any sort of complex
> > algorithm we expect to be able to cache intermediate results).
> >
> > +1 to FLINK-1730
> >
> > Everything in Mahout is modular- distributed engine
> > (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA /
> > Write-your-own), algorithms, etc.
> >
> > So to sum up, you're noting the redundancy between ML packages in terms
> of
> > algorithms- I would recommend checking out Mahout before rolling your own
> > GPU integration (else risk redundantly integrating GPUs). If nothing
> else-
> > it should give you some valuable insight regarding design considerations.
> > Also FYI the goal of the Apache Mahout project is to address that problem
> > precisely- implement an algorithm once in a mathematically expressive
> DSL,
> > which is abstracted above the engine so the same code easily ports
> between
> > engines / native solvers (i.e. CPU/GPU).
> >
> > https://github.com/apache/mahout/tree/master/viennacl-omp
> > https://github.com/apache/mahout/tree/master/viennacl
> >
> > Best,
> > tg
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
> >
> > On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri <katherinmail@gmail.com>
> > wrote:
> >
> >> Thank you Felix, for provided information.
> >>
> >> Currently I analyze the provided integration of Flink with SystemML.
> >>
> >> And also gather the information for the ticket  FLINK-1730
> >> <https://issues.apache.org/jira/browse/FLINK-1730>, maybe we will take
> it
> >> to work, to unlock SystemML/Flink integration.
> >>
> >>
> >>
> >> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz <neutatz@googlemail.com.invali
> >> d>:
> >>
> >> > Hi Kate,
> >> >
> >> > 1) - Broadcast:
> >> >
> >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+
> >> Only+send+data+to+each+taskmanager+once+for+broadcasts
> >> >  - Caching: https://issues.apache.org/jira/browse/FLINK-1730
> >> >
> >> > 2) I have no idea about the GPU implementation. The SystemML mailing
> >> list
> >> > will probably help you out their.
> >> >
> >> > Best regards,
> >> > Felix
> >> >
> >> > 2017-02-08 14:33 GMT+01:00 Katherin Eri <katherinmail@gmail.com>:
> >> >
> >> > > Thank you Felix, for your point, it is quite interesting.
> >> > >
> >> > > I will take a look at the code, of the provided Flink integration.
> >> > >
> >> > > 1)    You have these problems with Flink: >>we realized that the
> lack
> >> of
> >> > a
> >> > > caching operator and a broadcast issue highly effects the
> performance,
> >> > have
> >> > > you already asked about this the community? In case yes: please
> >> provide
> >> > the
> >> > > reference to the ticket or the topic of letter.
> >> > >
> >> > > 2)    You have said, that SystemML provides GPU support. I have seen
> >> > > SystemML’s source code and would like to ask: why you have decided
> to
> >> > > implement your own integration with cuda? Did you try to consider
> >> ND4J,
> >> > or
> >> > > because it is younger, you support your own implementation?
> >> > >
> >> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz <neutatz@googlemail.com
> >:
> >> > >
> >> > > > Hi Katherin,
> >> > > >
> >> > > > we are also working in a similar direction. We implemented a
> >> prototype
> >> > to
> >> > > > integrate with SystemML:
> >> > > > https://github.com/apache/incubator-systemml/pull/119
> >> > > > SystemML provides many different matrix formats, operations, GPU
> >> > support
> >> > > > and a couple of DL algorithms. Unfortunately, we realized that the
> >> lack
> >> > > of
> >> > > > a caching operator and a broadcast issue highly effects the
> >> performance
> >> > > > (e.g. compared to Spark). At the moment I am trying to tackle the
> >> > > broadcast
> >> > > > issue. But caching is still a problem for us.
> >> > > >
> >> > > > Best regards,
> >> > > > Felix
> >> > > >
> >> > > > 2017-02-07 16:22 GMT+01:00 Katherin Eri <katherinmail@gmail.com>:
> >> > > >
> >> > > > > Thank you, Till.
> >> > > > >
> >> > > > > 1)      Regarding ND4J, I didn’t know about such a pity and
> >> critical
> >> > > > > restriction of it -> lack of sparsity optimizations, and you are
> >> > right:
> >> > > > > this issue is still actual for them. I saw that Flink uses
> Breeze,
> >> > but
> >> > > I
> >> > > > > thought its usage caused by some historical reasons.
> >> > > > >
> >> > > > > 2)      Regarding integration with DL4J, I have read the source
> >> code
> >> > of
> >> > > > > DL4J/Spark integration, that’s why I have declined my idea of
> >> reuse
> >> > of
> >> > > > > their word2vec implementation for now, for example. I can
> perform
> >> > > deeper
> >> > > > > investigation of this topic, if it required.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > So I feel that we have the following picture:
> >> > > > >
> >> > > > > 1)      DL integration investigation, could be part of Apache
> >> Bahir.
> >> > I
> >> > > > can
> >> > > > > perform futher investigation of this topic, but I thik we need
> >> some
> >> > > > > separated ticket for this to track this activity.
> >> > > > >
> >> > > > > 2)      GPU support, required for DL is interesting, but
> requires
> >> > ND4J
> >> > > > for
> >> > > > > example.
> >> > > > >
> >> > > > > 3)      ND4J couldn’t be incorporated because it doesn’t support
> >> > > sparsity
> >> > > > > <https://deeplearning4j.org/roadmap.html> [1].
> >> > > > >
> >> > > > > Regarding ND4J is this the single blocker for incorporation of
> it
> >> or
> >> > > may
> >> > > > be
> >> > > > > some others known?
> >> > > > >
> >> > > > >
> >> > > > > [1] https://deeplearning4j.org/roadmap.html
> >> > > > >
> >> > > > > вт, 7 февр. 2017 г. в 16:26, Till Rohrmann <
> trohrmann@apache.org
> >> >:
> >> > > > >
> >> > > > > Thanks for initiating this discussion Katherin. I think you're
> >> right
> >> > > that
> >> > > > > in general it does not make sense to reinvent the wheel over and
> >> over
> >> > > > > again. Especially if you only have limited resources at hand. So
> >> if
> >> > we
> >> > > > > could integrate Flink with some existing library that would be
> >> great.
> >> > > > >
> >> > > > > In the past, however, we couldn't find a good library which
> >> provided
> >> > > > enough
> >> > > > > freedom to integrate it with Flink. Especially if you want to
> have
> >> > > > > distributed and somewhat high-performance implementations of ML
> >> > > > algorithms
> >> > > > > you would have to take Flink's execution model (capabilities as
> >> well
> >> > as
> >> > > > > limitations) into account. That is mainly the reason why we
> >> started
> >> > > > > implementing some of the algorithms "natively" on Flink.
> >> > > > >
> >> > > > > If I remember correctly, then the problem with ND4J was and
> still
> >> is
> >> > > that
> >> > > > > it does not support sparse matrices which was a requirement from
> >> our
> >> > > > side.
> >> > > > > As far as I know, it is quite common that you have sparse data
> >> > > structures
> >> > > > > when dealing with large scale problems. That's why we built our
> >> own
> >> > > > > abstraction which can have different implementations. Currently,
> >> the
> >> > > > > default implementation uses Breeze.
> >> > > > >
> >> > > > > I think the support for GPU based operations and the actual
> >> resource
> >> > > > > management are two orthogonal things. The implementation would
> >> have
> >> > to
> >> > > > work
> >> > > > > with no GPUs available anyway. If the system detects that GPUs
> are
> >> > > > > available, then ideally it would exploit them. Thus, we could
> add
> >> > this
> >> > > > > feature later and maybe integrate it with FLINK-5131 [1].
> >> > > > >
> >> > > > > Concerning the integration with DL4J I think that Theo's
> proposal
> >> to
> >> > do
> >> > > > it
> >> > > > > in a separate repository (maybe as part of Apache Bahir) is a
> good
> >> > > idea.
> >> > > > > We're currently thinking about outsourcing some of Flink's
> >> libraries
> >> > > into
> >> > > > > sub projects. This could also be an option for the DL4J
> >> integration
> >> > > then.
> >> > > > > In general I think it should be feasible to run DL4J on Flink
> >> given
> >> > > that
> >> > > > it
> >> > > > > also runs on Spark. Have you already looked at it closer?
> >> > > > >
> >> > > > > [1] https://issues.apache.org/jira/browse/FLINK-5131
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Till
> >> > > > >
> >> > > > > On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri <
> >> > katherinmail@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Thank you Theodore, for your reply.
> >> > > > > >
> >> > > > > > 1)    Regarding GPU, your point is clear and I agree with it,
> >> ND4J
> >> > > > looks
> >> > > > > > appropriate. But, my current understanding is that, we also
> >> need to
> >> > > > cover
> >> > > > > > some resource management questions -> when we need to provide
> >> GPU
> >> > > > support
> >> > > > > > we also need to manage it like resource. For example, Mesos
> has
> >> > > already
> >> > > > > > supported GPU like resource item: Initial support for GPU
> >> > resources.
> >> > > > > > <
> >> > https://issues.apache.org/jira/browse/MESOS-4424?jql=text%20~%20GPU
> >> > > >
> >> > > > > > Flink
> >> > > > > > uses Mesos as cluster manager, and this means that this
> feature
> >> of
> >> > > > Mesos
> >> > > > > > could be reused. Also memory managing questions in Flink
> >> regarding
> >> > > GPU
> >> > > > > > should be clarified.
> >> > > > > >
> >> > > > > > 2)    Regarding integration with DL4J: what stops us to
> >> initialize
> >> > > > ticket
> >> > > > > > and start the discussion around this topic? We need some user
> >> story
> >> > > or
> >> > > > > the
> >> > > > > > community is not sure that DL is really helpful? Why the
> >> discussion
> >> > > > with
> >> > > > > > Adam
> >> > > > > > Gibson just finished with no implementation of any idea? What
> >> > > concerns
> >> > > > do
> >> > > > > > we have?
> >> > > > > >
> >> > > > > > пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis <
> >> > > > > > theodoros.vasiloudis@gmail.com>:
> >> > > > > >
> >> > > > > > > Hell all,
> >> > > > > > >
> >> > > > > > > This is point that has come up in the past: Given the
> >> multitude
> >> > of
> >> > > ML
> >> > > > > > > libraries out there, should we have native implementations
> in
> >> > > FlinkML
> >> > > > > or
> >> > > > > > > try to integrate other libraries instead?
> >> > > > > > >
> >> > > > > > > We haven't managed to reach a consensus on this before. My
> >> > opinion
> >> > > is
> >> > > > > > that
> >> > > > > > > there is definitely value in having ML algorithms written
> >> > natively
> >> > > in
> >> > > > > > > Flink, both for performance optimization,
> >> > > > > > > but more importantly for engineering simplicity, we don't
> >> want to
> >> > > > force
> >> > > > > > > users to use yet another piece of software to run their ML
> >> algos
> >> > > (at
> >> > > > > > least
> >> > > > > > > for a basic set of algorithms).
> >> > > > > > >
> >> > > > > > > We have in the past  discussed integrations with DL4J
> >> > (particularly
> >> > > > > ND4J)
> >> > > > > > > with Adam Gibson, the core developer of the library, but we
> >> never
> >> > > got
> >> > > > > > > around to implementing anything.
> >> > > > > > >
> >> > > > > > > Whether it makes sense to have an integration with DL4J as
> >> part
> >> > of
> >> > > > the
> >> > > > > > > Flink distribution would be up for discussion. I would
> >> suggest to
> >> > > > make
> >> > > > > it
> >> > > > > > > an independent repo to allow for
> >> > > > > > > faster dev/release cycles, and because it wouldn't be
> directly
> >> > > > related
> >> > > > > to
> >> > > > > > > the core of Flink so it would add extra reviewing burden to
> an
> >> > > > already
> >> > > > > > > overloaded group of committers.
> >> > > > > > >
> >> > > > > > > Natively supporting GPU calculations in Flink would be much
> >> > better
> >> > > > > > achieved
> >> > > > > > > through a library like ND4J, the engineering burden would be
> >> too
> >> > > much
> >> > > > > > > otherwise.
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Theodore
> >> > > > > > >
> >> > > > > > > On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri <
> >> > > > katherinmail@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hello, guys.
> >> > > > > > > >
> >> > > > > > > > Theodore, last week I started the review of the PR:
> >> > > > > > > > https://github.com/apache/flink/pull/2735 related to
> >> *word2Vec
> >> > > for
> >> > > > > > > Flink*.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > During this review I have asked myself: why do we need to
> >> > > implement
> >> > > > > > such
> >> > > > > > > a
> >> > > > > > > > very popular algorithm like *word2vec one more time*, when
> >> > there
> >> > > is
> >> > > > > > > already
> >> > > > > > > > available implementation in java provided by
> >> > deeplearning4j.org
> >> > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J ->
> >> Apache
> >> > 2
> >> > > > > > > licence).
> >> > > > > > > > This library tries to promote itself, there is a hype
> >> around it
> >> > > in
> >> > > > ML
> >> > > > > > > > sphere, and it was integrated with Apache Spark, to
> provide
> >> > > > scalable
> >> > > > > > > > deeplearning calculations.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > *That's why I thought: could we integrate with this
> library
> >> or
> >> > > not
> >> > > > > also
> >> > > > > > > and
> >> > > > > > > > Flink? *
> >> > > > > > > >
> >> > > > > > > > 1) Personally I think, providing support and deployment of
> >> > > > > > > > *Deeplearning(DL)
> >> > > > > > > > algorithms/models in Flink* is promising and attractive
> >> > feature,
> >> > > > > > because:
> >> > > > > > > >
> >> > > > > > > >     a) during last two years DL proved its efficiency and
> >> these
> >> > > > > > > algorithms
> >> > > > > > > > used in many applications. For example *Spotify *uses DL
> >> based
> >> > > > > > algorithms
> >> > > > > > > > for music content extraction: Recommending music on
> Spotify
> >> > with
> >> > > > deep
> >> > > > > > > > learning AUGUST 05, 2014
> >> > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html>
> for
> >> > > their
> >> > > > > > music
> >> > > > > > > > recommendations. Developers need to scale up DL manually,
> >> that
> >> > > > causes
> >> > > > > a
> >> > > > > > > lot
> >> > > > > > > > of work, so that’s why such platforms like Flink should
> >> support
> >> > > > these
> >> > > > > > > > models deployment.
> >> > > > > > > >
> >> > > > > > > >     b) Here is presented the scope of Deeplearning usage
> >> cases
> >> > > > > > > > <https://deeplearning4j.org/use_cases>, so many of this
> >> > > scenarios
> >> > > > > > > related
> >> > > > > > > > to scenarios, that could be supported on Flink.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > 2) But DL uncover such questions like:
> >> > > > > > > >
> >> > > > > > > >     a) scale up calculations over machines
> >> > > > > > > >
> >> > > > > > > >     b) perform these calculations both over CPU and GPU.
> >> GPU is
> >> > > > > > required
> >> > > > > > > to
> >> > > > > > > > train big DL models, otherwise learning process could have
> >> very
> >> > > > slow
> >> > > > > > > > convergence.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > 3) I have checked this DL4J library, which already have
> >> reach
> >> > > > support
> >> > > > > > of
> >> > > > > > > > many attractive DL models like: Recurrent Networks and
> >> LSTMs,
> >> > > > > > > Convolutional
> >> > > > > > > > Networks (CNN), Restricted Boltzmann Machines (RBM) and
> >> others.
> >> > > So
> >> > > > we
> >> > > > > > > won’t
> >> > > > > > > > need to implement them independently, but only provide the
> >> > > ability
> >> > > > of
> >> > > > > > > > execution of this models over Flink cluster, the quite
> >> similar
> >> > > way
> >> > > > > like
> >> > > > > > > it
> >> > > > > > > > was integrated with Apache Spark.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Because of all of this I propose:
> >> > > > > > > >
> >> > > > > > > > 1)    To create new ticket in Flink’s JIRA for integration
> >> of
> >> > > Flink
> >> > > > > > with
> >> > > > > > > > DL4J and decide on which side this integration should be
> >> > > > implemented.
> >> > > > > > > >
> >> > > > > > > > 2)    Support natively GPU resources in Flink and allow
> >> > > > calculations
> >> > > > > > over
> >> > > > > > > > them, like that is described in this publication
> >> > > > > > > > https://www.oreilly.com/learning/accelerating-spark-
> >> > > > > > workloads-using-gpus
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > *Regarding original issue Implement Word2Vec
> >> > > > > > > > <https://issues.apache.org/jira/browse/FLINK-2094>in
> Flink,
> >> > *I
> >> > > > have
> >> > > > > > > > investigated its implementation in DL4J and  that
> >> > implementation
> >> > > of
> >> > > > > > > > integration DL4J with Apache Spark, and got several
> points:
> >> > > > > > > >
> >> > > > > > > > It seems that idea of building of our own implementation
> of
> >> > > > word2vec
> >> > > > > in
> >> > > > > > > > Flink not such a bad solution, because: This DL4J was
> >> forced to
> >> > > > > > > reimplement
> >> > > > > > > > its original word2Vec over Spark. I have checked the
> >> > integration
> >> > > of
> >> > > > > > DL4J
> >> > > > > > > > with Spark, and found that it is too strongly coupled with
> >> > Spark
> >> > > > API,
> >> > > > > > so
> >> > > > > > > > that it is impossible just to take some DL4J API and reuse
> >> it,
> >> > > > > instead
> >> > > > > > we
> >> > > > > > > > need to implement independent integration for Flink.
> >> > > > > > > >
> >> > > > > > > > *That’s why we simply finish implementation of current PR
> >> > > > > > > > **independently **from
> >> > > > > > > > integration to DL4J.*
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Could you please provide your opinion regarding my
> questions
> >> > and
> >> > > > > > points,
> >> > > > > > > > what do you think about them?
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > пн, 6 февр. 2017 г. в 12:51, Katherin Eri <
> >> > > katherinmail@gmail.com
> >> > > > >:
> >> > > > > > > >
> >> > > > > > > > > Sorry, guys I need to finish this letter first.
> >> > > > > > > > >   Full version of it will come shortly.
> >> > > > > > > > >
> >> > > > > > > > > пн, 6 февр. 2017 г. в 12:49, Katherin Eri <
> >> > > > katherinmail@gmail.com
> >> > > > > >:
> >> > > > > > > > >
> >> > > > > > > > > Hello, guys.
> >> > > > > > > > > Theodore, last week I started the review of the PR:
> >> > > > > > > > > https://github.com/apache/flink/pull/2735 related to
> >> > *word2Vec
> >> > > > for
> >> > > > > > > > Flink*.
> >> > > > > > > > >
> >> > > > > > > > > During this review I have asked myself: why do we need
> to
> >> > > > implement
> >> > > > > > > such
> >> > > > > > > > a
> >> > > > > > > > > very popular algorithm like *word2vec one more time*,
> when
> >> > > there
> >> > > > is
> >> > > > > > > > > already availabe implementation in java provided by
> >> > > > > > deeplearning4j.org
> >> > > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J ->
> >> > Apache
> >> > > 2
> >> > > > > > > > licence).
> >> > > > > > > > > This library tries to promote it self, there is a hype
> >> around
> >> > > it
> >> > > > in
> >> > > > > > ML
> >> > > > > > > > > sphere, and  it was integrated with Apache Spark, to
> >> provide
> >> > > > > scalable
> >> > > > > > > > > deeplearning calculations.
> >> > > > > > > > > That's why I thought: could we integrate with this
> >> library or
> >> > > not
> >> > > > > > also
> >> > > > > > > > and
> >> > > > > > > > > Flink?
> >> > > > > > > > > 1) Personally I think, providing support and deployment
> of
> >> > > > > > Deeplearning
> >> > > > > > > > > algorithms/models in Flink is promising and attractive
> >> > feature,
> >> > > > > > > because:
> >> > > > > > > > >     a) during last two years deeplearning proved its
> >> > efficiency
> >> > > > and
> >> > > > > > > this
> >> > > > > > > > > algorithms used in many applications. For example
> *Spotify
> >> > > *uses
> >> > > > DL
> >> > > > > > > based
> >> > > > > > > > > algorithms for music content extraction: Recommending
> >> music
> >> > on
> >> > > > > > Spotify
> >> > > > > > > > > with deep learning AUGUST 05, 2014
> >> > > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html>
> >> for
> >> > > > their
> >> > > > > > > music
> >> > > > > > > > > recommendations. Doing this natively scalable is very
> >> > > attractive.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I have investigated that implementation of integration
> >> DL4J
> >> > > with
> >> > > > > > Apache
> >> > > > > > > > > Spark, and got several points:
> >> > > > > > > > >
> >> > > > > > > > > 1) It seems that idea of building of our own
> >> implementation
> >> > of
> >> > > > > > word2vec
> >> > > > > > > > > not such a bad solution, because the integration of DL4J
> >> with
> >> > > > Spark
> >> > > > > > is
> >> > > > > > > > too
> >> > > > > > > > > strongly coupled with Saprk API and it will take time
> from
> >> > the
> >> > > > side
> >> > > > > > of
> >> > > > > > > > DL4J
> >> > > > > > > > > to adopt this integration to Flink. Also I have expected
> >> that
> >> > > we
> >> > > > > will
> >> > > > > > > be
> >> > > > > > > > > able to call just some API, it is not such thing.
> >> > > > > > > > > 2)
> >> > > > > > > > >
> >> > > > > > > > > https://deeplearning4j.org/use_cases
> >> > > > > > > > > https://www.analyticsvidhya.com/blog/2017/01/t-sne-
> >> > > > > > > > implementation-r-python/
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > чт, 19 янв. 2017 г. в 13:29, Till Rohrmann <
> >> > > trohrmann@apache.org
> >> > > > >:
> >> > > > > > > > >
> >> > > > > > > > > Hi Katherin,
> >> > > > > > > > >
> >> > > > > > > > > welcome to the Flink community. Always great to see new
> >> > people
> >> > > > > > joining
> >> > > > > > > > the
> >> > > > > > > > > community :-)
> >> > > > > > > > >
> >> > > > > > > > > Cheers,
> >> > > > > > > > > Till
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko <
> >> > > > > > > > katherinmail@gmail.com>
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > ok, I've got it.
> >> > > > > > > > > > I will take a look at
> >> > > > https://github.com/apache/flink/pull/2735
> >> > > > > .
> >> > > > > > > > > >
> >> > > > > > > > > > вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis <
> >> > > > > > > > > > theodoros.vasiloudis@gmail.com>:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hello Katherin,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Welcome to the Flink community!
> >> > > > > > > > > > >
> >> > > > > > > > > > > The ML component definitely needs a lot of work you
> >> are
> >> > > > > correct,
> >> > > > > > we
> >> > > > > > > > are
> >> > > > > > > > > > > facing similar problems to CEP, which we'll
> hopefully
> >> > > resolve
> >> > > > > > with
> >> > > > > > > > the
> >> > > > > > > > > > > restructuring Stephan has mentioned in that thread.
> >> > > > > > > > > > >
> >> > > > > > > > > > > If you'd like to help out with PRs we have many
> open,
> >> > one I
> >> > > > > have
> >> > > > > > > > > started
> >> > > > > > > > > > > reviewing but got side-tracked is the Word2Vec one
> >> [1].
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > > Theodore
> >> > > > > > > > > > >
> >> > > > > > > > > > > [1] https://github.com/apache/flink/pull/2735
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske <
> >> > > > > > fhueske@gmail.com
> >> > > > > > > >
> >> > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Katherin,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > welcome to the Flink community!
> >> > > > > > > > > > > > Help with reviewing PRs is always very welcome
> and a
> >> > > great
> >> > > > > way
> >> > > > > > to
> >> > > > > > > > > > > > contribute.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Best, Fabian
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko <
> >> > > > > > > > katherinmail@gmail.com
> >> > > > > > > > > >:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Thank you, Timo.
> >> > > > > > > > > > > > > I have started the analysis of the topic.
> >> > > > > > > > > > > > > And if it necessary, I will try to perform the
> >> review
> >> > > of
> >> > > > > > other
> >> > > > > > > > > pulls)
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther <
> >> > > > > > twalthr@apache.org
> >> > > > > > > >:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi Katherin,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > great to hear that you would like to
> contribute!
> >> > > > Welcome!
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I gave you contributor permissions. You can
> now
> >> > > assign
> >> > > > > > issues
> >> > > > > > > > to
> >> > > > > > > > > > > > > > yourself. I assigned FLINK-1750 to you.
> >> > > > > > > > > > > > > > Right now there are many open ML pull
> requests,
> >> you
> >> > > are
> >> > > > > > very
> >> > > > > > > > > > welcome
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > review the code of others, too.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Timo
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
> >> > > > > > > > > > > > > > > Hello, All!
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year
> >> > > > enterprise
> >> > > > > > > > > > experience,
> >> > > > > > > > > > > > > also
> >> > > > > > > > > > > > > > I
> >> > > > > > > > > > > > > > > have some expertise with scala (half of the
> >> > year).
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Last 2 years I have participated in several
> >> > BigData
> >> > > > > > > projects
> >> > > > > > > > > that
> >> > > > > > > > > > > > were
> >> > > > > > > > > > > > > > > related to Machine Learning (Time series
> >> > analysis,
> >> > > > > > > > Recommender
> >> > > > > > > > > > > > systems,
> >> > > > > > > > > > > > > > > Social networking) and ETL. I have
> experience
> >> > with
> >> > > > > > Hadoop,
> >> > > > > > > > > Apache
> >> > > > > > > > > > > > Spark
> >> > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > Hive.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink
> >> > project
> >> > > > > > requires
> >> > > > > > > > > some
> >> > > > > > > > > > > work
> >> > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > this area, so that’s why I would like to
> join
> >> > Flink
> >> > > > and
> >> > > > > > ask
> >> > > > > > > > me
> >> > > > > > > > > to
> >> > > > > > > > > > > > grant
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > assignment of the ticket
> >> > > > > > > > > > > > > > https://issues.apache.org/jira
> >> /browse/FLINK-1750
> >> > > > > > > > > > > > > > > to me.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message