apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <thomas.we...@gmail.com>
Subject Re: Megh operator library
Date Sun, 25 Sep 2016 18:39:45 GMT
Thanks for putting it together. It looks like there are really only 2
operators?

+1 for the Flume connector. It would be good to also look what has changed
in Flume since it was written. It needs its own Maven module and
documentation is also needed.

I don't agree with the proposed "as-is" move for the dimension compute
operator into contrib. It does not belong there. Contrib is for new,
incomplete work ("immature" and under the radar WRT CI etc.), with
particular focus to provide an easier entry path for new contributors.

I would like to see the following changes to dimension computation:
* Replace HDHT with managed state (or spillable DS)
* Move to org.apache.apex.malhar.lib.*
* Documentation (your draft is a good start towards that), it also needs to
cover query support.

I think it is a very valuable operator that should be a first class citizen
and the folks familiar with the operator and state management should take
up the work to port it. Tim indicated he may be able to take it up.

In the meantime, the operator can remain in the Megh repository under
existing name and consumed from there.

Thomas

On Sat, Sep 24, 2016 at 12:29 PM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> Hi,
>
> Here is the initial proposal. Please go through it and you can comment
> right on the document. Regarding the discussions around Dimensional
> operators, there is a specific section for it and future plans. After the
> comments are addressed, I can start with one of the components such as
> flume and document the steps involved. Then others can take up the other
> components and use the steps in a similar fashion.
>
> https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5
> nTP7cUQOAlVs0g
>
> Thanks
>
> On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <amol@datatorrent.com> wrote:
>
> > Thomas,
> > IMHO we should also look at the cost to users on keeping code in a github
> > (even if under ASF 2.0 license) outside Malhar. There is value to
> > deprecating code in Megh, and moving it to Malhar. Volunteers in this
> > effort could decide on how much overlap means "mark as overlapping", My
> > suggesstion is to absorb overlapping operators into a directory in Malhar
> > that marks it as such. A lot of these operators are being used in
> > production and it make sense to absorb them into Apache gitHub.
> >
> > Thks
> > Amol
> >
> >
> >
> >
> > On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <pramod@datatorrent.com
> >
> > wrote:
> >
> > > It would be great to have Tim's help with dimension computation but I
> > > think we can still debate whether HDHT dependency needs to be removed
> > > before contribution or whether it can be done as a two step process
> > > since we also have a place to put experimental code contrib and HDHT
> > > could go in there till we can determine/port it to use managed. state.
> > >
> > > My thought on this is that if it is going to be a significant porting
> > > effort then we do it as a two step process.
> > >
> > > Thanks
> > >
> > > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <thomas@datatorrent.com>
> > > wrote:
> > > >
> > > > Tim,
> > > >
> > > > The functionality of the dimension compute operator should be
> available
> > > in
> > > > Malhar. My concern is moving things without regard to code
> duplication
> > > and
> > > > long term maintenance cost. There are several pieces to the dimension
> > > > compute operator that in fact are (or should be) reusable components
> by
> > > > themselves. Live querying (queryable state) with schemas is one such
> > > > example. It's a major feature and not limited to the dimension
> compute
> > > > operator. It should ideally work with the new windowing support as
> > well.
> > > > But the main area that needs work is the state store - the dependency
> > on
> > > > HDHT needs to be removed and replaced with managed state. Also I'm
> > > curious
> > > > why the window operator should not scale for large time buckets? Are
> > you
> > > > referring to the current intermediate implementation or the work in
> > > > progress that will use incremental state saving? If so, please bring
> it
> > > up
> > > > on APEXMALHAR-2130 as it is pretty important.
> > > >
> > > > Since you have written almost all of the dimension compute code,
> could
> > > you
> > > > help with the changes needed to bring it over? It would also be good
> to
> > > see
> > > > the user documentation in Malhar.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <
> > > timothyfarkas@apache.org>
> > > > wrote:
> > > >
> > > >> Hi Thomas,
> > > >>
> > > >> With respect to the dimension operator, I would like to learn more
> > about
> > > >> the underlying framework you mentioned and the code duplication. If
> > you
> > > are
> > > >> talking about the Window operator framework, that framework is not
> > > suitable
> > > >> for the dimension computation use case because it doesn't scale for
> > > large
> > > >> timebuckets. Furthermore that framework has no support for Querying.
> > The
> > > >> dimension operators support live queries of the aggregated data.
> > > Querying
> > > >> of live data streams is a popular feature in other open source
> > > platforms,
> > > >> and I believe it is a worthwhile addition to Malhar.
> > > >>
> > > >> Given the fact that the dimension framework has been used in many
> POCs
> > > and
> > > >> is even running in production and has novel features like live
> > > querying, it
> > > >> more than meets the bar for a malhar contribution. If a concrete
> > > argument
> > > >> cannot be provided to prevent this work from going into Malhar, then
> > > these
> > > >> efforts should not be blocked.
> > > >>
> > > >> Thanks,
> > > >> Tim
> > > >>
> > > >>> On 2016-09-09 17:18 (-0700), Thomas Weise <thomas@datatorrent.com>
> > > wrote:
> > > >>> I see no reason to move the dimension operator along with
> everything
> > it
> > > >>> duplicates to Malhar. It's available to use for everyone as it
is
> and
> > > >> there
> > > >>> should be an initiative to make it confirm to the underlying
> > framework
> > > to
> > > >>> be part of Malhar.
> > > >>>
> > > >>> Also there is already an enrichment operator, there is even
> > > documentation
> > > >>> for it.
> > > >>>
> > > >>> Hence, this needs to be analyzed properly.
> > > >>>
> > > >>> Thomas
> > > >>>
> > > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <
> > > pramod@datatorrent.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Yes, I do plan to come up with a proposal with a list. The
ones
> that
> > > >> come
> > > >>>> to mind are flume, enrichment, various dimensional operators
and
> any
> > > >> custom
> > > >>>> partitioners. The dimensional operators are in a mature state
and
> > > >> usable
> > > >>>> today, in future they could also be ported onto the new windowing
> > and
> > > >>>> managed state operator framework.
> > > >>>>
> > > >>>> Thanks
> > > >>>>
> > > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <
> > thomas@datatorrent.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> A cursory look suggests there is a lot of overlap. I'm
looking
> > > >> forward to
> > > >>>>> see a proposal that reflects a vision how to evolve Malhar
rather
> > > >> than
> > > >>>> just
> > > >>>>> moving around code.
> > > >>>>>
> > > >>>>> Thomas
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni <
> > > >> pramod@datatorrent.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> DataTorrent, the initial contributor to Apex and the
company I
> > work
> > > >>>> for,
> > > >>>>>> has opened up a library of operators called Megh recently
to the
> > > >> public
> > > >>>>> and
> > > >>>>>> has made the repository available under the Apache
License. The
> > > >> link to
> > > >>>>> the
> > > >>>>>> repository is below. These operators, for the most
part, contain
> > > >>>>>> functionality that is complementary to what Malhar
library
> > > >> provides and
> > > >>>>>> were developed to solve business use cases that arose
over time.
> > > >> Also,
> > > >>>>> some
> > > >>>>>> operators in Malhar were inspired from early implementations
in
> > the
> > > >>>> Megh
> > > >>>>>> library and were built upon knowledge gained in doing
the
> original
> > > >>>>>> implementations.
> > > >>>>>>
> > > >>>>>> Our goal is to not have Megh as a separate library
but rather
> > bring
> > > >>>> these
> > > >>>>>> operators into Malhar in a fashion that it is consistent
with
> the
> > > >>>> Malhar
> > > >>>>>> project and repository. In the upcoming days, in a
gradual
> > > >> fashion, we
> > > >>>>> will
> > > >>>>>> have more details on the individual operators that
we would like
> > to
> > > >>>>>> contribute. Also, if you are interested in helping
with this
> > effort
> > > >>>>> please
> > > >>>>>> raise your hand.
> > > >>>>>>
> > > >>>>>> https://github.com/DataTorrent/Megh/
> > > >>>>>>
> > > >>>>>> Thanks
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message