apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: Megh operator library
Date Mon, 28 Nov 2016 23:26:57 GMT
I am not sure where we are on this. As Apex community we need to take a
hard look before we reject code that passes license and normal pull request
requirements. A lot of Megh code is in production, and is stable. Is there
a reason why we cannot accomodate Megh code in a directory that clarifies
its origins. Assuming there is duplicates, the code can still be taken in a
directory that marks is as such.

Without this a lof customer custom code that they want to contribute to
Malhar will be stuck in "replace with ...". That will not happen as folks
do not change production code once it works and stabalized. If the word
"contrib" is an issue, I suggest we get a new name. HDHT etc. are in
production and it makes sense to let them reside in a directory (to be
named) in Malhar.

Thks
Amol


On Mon, Sep 26, 2016 at 10:22 PM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> Added a section for flume based on the feedback.
>
> Thanks
>
> On Mon, Sep 26, 2016 at 8:51 AM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
> > Hi Thomas,
> >
> > My responses are inline
> >
> > On Sun, Sep 25, 2016 at 11:39 AM, Thomas Weise <thomas.weise@gmail.com>
> > wrote:
> >
> >> Thanks for putting it together. It looks like there are really only 2
> >> operators?
> >>
> >
> > There were others but looked like they were already good implementations
> > or alternatives for it in Malhar. For example, enrichment and deduper
> have
> > implementations already, for laggards operator looked like the concept is
> > already covered in the new windowing work.
> >
> >
> >>
> >> +1 for the Flume connector. It would be good to also look what has
> changed
> >> in Flume since it was written. It needs its own Maven module and
> >> documentation is also needed.
> >>
> >
> > Yes in the table in the document I have it going to its own module and
> > path. Will make a note in the document about checking against newer flume
> > versions and documentation.
> >
> >
> >> I don't agree with the proposed "as-is" move for the dimension compute
> >> operator into contrib. It does not belong there. Contrib is for new,
> >> incomplete work ("immature" and under the radar WRT CI etc.), with
> >> particular focus to provide an easier entry path for new contributors.
> >>
> >> I would like to see the following changes to dimension computation:
> >> * Replace HDHT with managed state (or spillable DS)
> >> * Move to org.apache.apex.malhar.lib.*
> >> * Documentation (your draft is a good start towards that), it also needs
> >> to
> >> cover query support.
> >>
> >> I think it is a very valuable operator that should be a first class
> >> citizen
> >> and the folks familiar with the operator and state management should
> take
> >> up the work to port it. Tim indicated he may be able to take it up.
> >>
> >> In the meantime, the operator can remain in the Megh repository under
> >> existing name and consumed from there.
> >>
> >
> > I thought it could eventually have its own module under Malhar but
> > suggested contrib as an intermediate location till any porting is
> > completed. I agree with the documentation, I just wrote up something
> quick
> > to highlight the operator, Tim has more detailed docs for it I think.
> Since
> > the operator(s) are readily usable in production applications, implement
> > quite a bit of functionality and provide valuable functionality, I am of
> > the opinion that we do the minimal now to make it available and parallely
> > start the work on porting some of the internal subsystems to newer
> > components.
> >
> > Thanks
> >
> >
> >>
> >> Thomas
> >>
> >> On Sat, Sep 24, 2016 at 12:29 PM, Pramod Immaneni <
> pramod@datatorrent.com
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Here is the initial proposal. Please go through it and you can comment
> >> > right on the document. Regarding the discussions around Dimensional
> >> > operators, there is a specific section for it and future plans. After
> >> the
> >> > comments are addressed, I can start with one of the components such as
> >> > flume and document the steps involved. Then others can take up the
> other
> >> > components and use the steps in a similar fashion.
> >> >
> >> > https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5
> >> > nTP7cUQOAlVs0g
> >> >
> >> > Thanks
> >> >
> >> > On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <amol@datatorrent.com>
> >> wrote:
> >> >
> >> > > Thomas,
> >> > > IMHO we should also look at the cost to users on keeping code in a
> >> github
> >> > > (even if under ASF 2.0 license) outside Malhar. There is value to
> >> > > deprecating code in Megh, and moving it to Malhar. Volunteers in
> this
> >> > > effort could decide on how much overlap means "mark as overlapping",
> >> My
> >> > > suggesstion is to absorb overlapping operators into a directory in
> >> Malhar
> >> > > that marks it as such. A lot of these operators are being used in
> >> > > production and it make sense to absorb them into Apache gitHub.
> >> > >
> >> > > Thks
> >> > > Amol
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <
> >> pramod@datatorrent.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > It would be great to have Tim's help with dimension computation
> but
> >> I
> >> > > > think we can still debate whether HDHT dependency needs to be
> >> removed
> >> > > > before contribution or whether it can be done as a two step
> process
> >> > > > since we also have a place to put experimental code contrib and
> HDHT
> >> > > > could go in there till we can determine/port it to use managed.
> >> state.
> >> > > >
> >> > > > My thought on this is that if it is going to be a significant
> >> porting
> >> > > > effort then we do it as a two step process.
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <
> thomas@datatorrent.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > Tim,
> >> > > > >
> >> > > > > The functionality of the dimension compute operator should
be
> >> > available
> >> > > > in
> >> > > > > Malhar. My concern is moving things without regard to code
> >> > duplication
> >> > > > and
> >> > > > > long term maintenance cost. There are several pieces to
the
> >> dimension
> >> > > > > compute operator that in fact are (or should be) reusable
> >> components
> >> > by
> >> > > > > themselves. Live querying (queryable state) with schemas
is one
> >> such
> >> > > > > example. It's a major feature and not limited to the dimension
> >> > compute
> >> > > > > operator. It should ideally work with the new windowing
support
> as
> >> > > well.
> >> > > > > But the main area that needs work is the state store - the
> >> dependency
> >> > > on
> >> > > > > HDHT needs to be removed and replaced with managed state.
Also
> I'm
> >> > > > curious
> >> > > > > why the window operator should not scale for large time
buckets?
> >> Are
> >> > > you
> >> > > > > referring to the current intermediate implementation or
the work
> >> in
> >> > > > > progress that will use incremental state saving? If so,
please
> >> bring
> >> > it
> >> > > > up
> >> > > > > on APEXMALHAR-2130 as it is pretty important.
> >> > > > >
> >> > > > > Since you have written almost all of the dimension compute
code,
> >> > could
> >> > > > you
> >> > > > > help with the changes needed to bring it over? It would
also be
> >> good
> >> > to
> >> > > > see
> >> > > > > the user documentation in Malhar.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Thomas
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <
> >> > > > timothyfarkas@apache.org>
> >> > > > > wrote:
> >> > > > >
> >> > > > >> Hi Thomas,
> >> > > > >>
> >> > > > >> With respect to the dimension operator, I would like
to learn
> >> more
> >> > > about
> >> > > > >> the underlying framework you mentioned and the code
> duplication.
> >> If
> >> > > you
> >> > > > are
> >> > > > >> talking about the Window operator framework, that framework
is
> >> not
> >> > > > suitable
> >> > > > >> for the dimension computation use case because it doesn't
scale
> >> for
> >> > > > large
> >> > > > >> timebuckets. Furthermore that framework has no support
for
> >> Querying.
> >> > > The
> >> > > > >> dimension operators support live queries of the aggregated
> data.
> >> > > > Querying
> >> > > > >> of live data streams is a popular feature in other open
source
> >> > > > platforms,
> >> > > > >> and I believe it is a worthwhile addition to Malhar.
> >> > > > >>
> >> > > > >> Given the fact that the dimension framework has been
used in
> many
> >> > POCs
> >> > > > and
> >> > > > >> is even running in production and has novel features
like live
> >> > > > querying, it
> >> > > > >> more than meets the bar for a malhar contribution. If
a
> concrete
> >> > > > argument
> >> > > > >> cannot be provided to prevent this work from going into
Malhar,
> >> then
> >> > > > these
> >> > > > >> efforts should not be blocked.
> >> > > > >>
> >> > > > >> Thanks,
> >> > > > >> Tim
> >> > > > >>
> >> > > > >>> On 2016-09-09 17:18 (-0700), Thomas Weise <
> >> thomas@datatorrent.com>
> >> > > > wrote:
> >> > > > >>> I see no reason to move the dimension operator along
with
> >> > everything
> >> > > it
> >> > > > >>> duplicates to Malhar. It's available to use for
everyone as it
> >> is
> >> > and
> >> > > > >> there
> >> > > > >>> should be an initiative to make it confirm to the
underlying
> >> > > framework
> >> > > > to
> >> > > > >>> be part of Malhar.
> >> > > > >>>
> >> > > > >>> Also there is already an enrichment operator, there
is even
> >> > > > documentation
> >> > > > >>> for it.
> >> > > > >>>
> >> > > > >>> Hence, this needs to be analyzed properly.
> >> > > > >>>
> >> > > > >>> Thomas
> >> > > > >>>
> >> > > > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni
<
> >> > > > pramod@datatorrent.com>
> >> > > > >>> wrote:
> >> > > > >>>
> >> > > > >>>> Yes, I do plan to come up with a proposal with
a list. The
> ones
> >> > that
> >> > > > >> come
> >> > > > >>>> to mind are flume, enrichment, various dimensional
operators
> >> and
> >> > any
> >> > > > >> custom
> >> > > > >>>> partitioners. The dimensional operators are
in a mature state
> >> and
> >> > > > >> usable
> >> > > > >>>> today, in future they could also be ported onto
the new
> >> windowing
> >> > > and
> >> > > > >>>> managed state operator framework.
> >> > > > >>>>
> >> > > > >>>> Thanks
> >> > > > >>>>
> >> > > > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise
<
> >> > > thomas@datatorrent.com>
> >> > > > >>>> wrote:
> >> > > > >>>>
> >> > > > >>>>> A cursory look suggests there is a lot of
overlap. I'm
> looking
> >> > > > >> forward to
> >> > > > >>>>> see a proposal that reflects a vision how
to evolve Malhar
> >> rather
> >> > > > >> than
> >> > > > >>>> just
> >> > > > >>>>> moving around code.
> >> > > > >>>>>
> >> > > > >>>>> Thomas
> >> > > > >>>>>
> >> > > > >>>>>
> >> > > > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni
<
> >> > > > >> pramod@datatorrent.com>
> >> > > > >>>>> wrote:
> >> > > > >>>>>
> >> > > > >>>>>> Hi,
> >> > > > >>>>>>
> >> > > > >>>>>> DataTorrent, the initial contributor
to Apex and the
> company
> >> I
> >> > > work
> >> > > > >>>> for,
> >> > > > >>>>>> has opened up a library of operators
called Megh recently
> to
> >> the
> >> > > > >> public
> >> > > > >>>>> and
> >> > > > >>>>>> has made the repository available under
the Apache License.
> >> The
> >> > > > >> link to
> >> > > > >>>>> the
> >> > > > >>>>>> repository is below. These operators,
for the most part,
> >> contain
> >> > > > >>>>>> functionality that is complementary
to what Malhar library
> >> > > > >> provides and
> >> > > > >>>>>> were developed to solve business use
cases that arose over
> >> time.
> >> > > > >> Also,
> >> > > > >>>>> some
> >> > > > >>>>>> operators in Malhar were inspired from
early
> implementations
> >> in
> >> > > the
> >> > > > >>>> Megh
> >> > > > >>>>>> library and were built upon knowledge
gained in doing the
> >> > original
> >> > > > >>>>>> implementations.
> >> > > > >>>>>>
> >> > > > >>>>>> Our goal is to not have Megh as a separate
library but
> rather
> >> > > bring
> >> > > > >>>> these
> >> > > > >>>>>> operators into Malhar in a fashion that
it is consistent
> with
> >> > the
> >> > > > >>>> Malhar
> >> > > > >>>>>> project and repository. In the upcoming
days, in a gradual
> >> > > > >> fashion, we
> >> > > > >>>>> will
> >> > > > >>>>>> have more details on the individual
operators that we would
> >> like
> >> > > to
> >> > > > >>>>>> contribute. Also, if you are interested
in helping with
> this
> >> > > effort
> >> > > > >>>>> please
> >> > > > >>>>>> raise your hand.
> >> > > > >>>>>>
> >> > > > >>>>>> https://github.com/DataTorrent/Megh/
> >> > > > >>>>>>
> >> > > > >>>>>> Thanks
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message