apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: Megh operator library
Date Sat, 10 Sep 2016 17:29:28 GMT
Thomas,
IMHO we should also look at the cost to users on keeping code in a github
(even if under ASF 2.0 license) outside Malhar. There is value to
deprecating code in Megh, and moving it to Malhar. Volunteers in this
effort could decide on how much overlap means "mark as overlapping", My
suggesstion is to absorb overlapping operators into a directory in Malhar
that marks it as such. A lot of these operators are being used in
production and it make sense to absorb them into Apache gitHub.

Thks
Amol




On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> It would be great to have Tim's help with dimension computation but I
> think we can still debate whether HDHT dependency needs to be removed
> before contribution or whether it can be done as a two step process
> since we also have a place to put experimental code contrib and HDHT
> could go in there till we can determine/port it to use managed. state.
>
> My thought on this is that if it is going to be a significant porting
> effort then we do it as a two step process.
>
> Thanks
>
> > On Sep 9, 2016, at 11:52 PM, Thomas Weise <thomas@datatorrent.com>
> wrote:
> >
> > Tim,
> >
> > The functionality of the dimension compute operator should be available
> in
> > Malhar. My concern is moving things without regard to code duplication
> and
> > long term maintenance cost. There are several pieces to the dimension
> > compute operator that in fact are (or should be) reusable components by
> > themselves. Live querying (queryable state) with schemas is one such
> > example. It's a major feature and not limited to the dimension compute
> > operator. It should ideally work with the new windowing support as well.
> > But the main area that needs work is the state store - the dependency on
> > HDHT needs to be removed and replaced with managed state. Also I'm
> curious
> > why the window operator should not scale for large time buckets? Are you
> > referring to the current intermediate implementation or the work in
> > progress that will use incremental state saving? If so, please bring it
> up
> > on APEXMALHAR-2130 as it is pretty important.
> >
> > Since you have written almost all of the dimension compute code, could
> you
> > help with the changes needed to bring it over? It would also be good to
> see
> > the user documentation in Malhar.
> >
> > Thanks,
> > Thomas
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <
> timothyfarkas@apache.org>
> > wrote:
> >
> >> Hi Thomas,
> >>
> >> With respect to the dimension operator, I would like to learn more about
> >> the underlying framework you mentioned and the code duplication. If you
> are
> >> talking about the Window operator framework, that framework is not
> suitable
> >> for the dimension computation use case because it doesn't scale for
> large
> >> timebuckets. Furthermore that framework has no support for Querying. The
> >> dimension operators support live queries of the aggregated data.
> Querying
> >> of live data streams is a popular feature in other open source
> platforms,
> >> and I believe it is a worthwhile addition to Malhar.
> >>
> >> Given the fact that the dimension framework has been used in many POCs
> and
> >> is even running in production and has novel features like live
> querying, it
> >> more than meets the bar for a malhar contribution. If a concrete
> argument
> >> cannot be provided to prevent this work from going into Malhar, then
> these
> >> efforts should not be blocked.
> >>
> >> Thanks,
> >> Tim
> >>
> >>> On 2016-09-09 17:18 (-0700), Thomas Weise <thomas@datatorrent.com>
> wrote:
> >>> I see no reason to move the dimension operator along with everything it
> >>> duplicates to Malhar. It's available to use for everyone as it is and
> >> there
> >>> should be an initiative to make it confirm to the underlying framework
> to
> >>> be part of Malhar.
> >>>
> >>> Also there is already an enrichment operator, there is even
> documentation
> >>> for it.
> >>>
> >>> Hence, this needs to be analyzed properly.
> >>>
> >>> Thomas
> >>>
> >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <
> pramod@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Yes, I do plan to come up with a proposal with a list. The ones that
> >> come
> >>>> to mind are flume, enrichment, various dimensional operators and any
> >> custom
> >>>> partitioners. The dimensional operators are in a mature state and
> >> usable
> >>>> today, in future they could also be ported onto the new windowing and
> >>>> managed state operator framework.
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <thomas@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>>> A cursory look suggests there is a lot of overlap. I'm looking
> >> forward to
> >>>>> see a proposal that reflects a vision how to evolve Malhar rather
> >> than
> >>>> just
> >>>>> moving around code.
> >>>>>
> >>>>> Thomas
> >>>>>
> >>>>>
> >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni <
> >> pramod@datatorrent.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> DataTorrent, the initial contributor to Apex and the company
I work
> >>>> for,
> >>>>>> has opened up a library of operators called Megh recently to
the
> >> public
> >>>>> and
> >>>>>> has made the repository available under the Apache License.
The
> >> link to
> >>>>> the
> >>>>>> repository is below. These operators, for the most part, contain
> >>>>>> functionality that is complementary to what Malhar library
> >> provides and
> >>>>>> were developed to solve business use cases that arose over time.
> >> Also,
> >>>>> some
> >>>>>> operators in Malhar were inspired from early implementations
in the
> >>>> Megh
> >>>>>> library and were built upon knowledge gained in doing the original
> >>>>>> implementations.
> >>>>>>
> >>>>>> Our goal is to not have Megh as a separate library but rather
bring
> >>>> these
> >>>>>> operators into Malhar in a fashion that it is consistent with
the
> >>>> Malhar
> >>>>>> project and repository. In the upcoming days, in a gradual
> >> fashion, we
> >>>>> will
> >>>>>> have more details on the individual operators that we would
like to
> >>>>>> contribute. Also, if you are interested in helping with this
effort
> >>>>> please
> >>>>>> raise your hand.
> >>>>>>
> >>>>>> https://github.com/DataTorrent/Megh/
> >>>>>>
> >>>>>> Thanks
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message