apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Megh operator library
Date Sat, 10 Sep 2016 06:52:19 GMT
Tim,

The functionality of the dimension compute operator should be available in
Malhar. My concern is moving things without regard to code duplication and
long term maintenance cost. There are several pieces to the dimension
compute operator that in fact are (or should be) reusable components by
themselves. Live querying (queryable state) with schemas is one such
example. It's a major feature and not limited to the dimension compute
operator. It should ideally work with the new windowing support as well.
But the main area that needs work is the state store - the dependency on
HDHT needs to be removed and replaced with managed state. Also I'm curious
why the window operator should not scale for large time buckets? Are you
referring to the current intermediate implementation or the work in
progress that will use incremental state saving? If so, please bring it up
on APEXMALHAR-2130 as it is pretty important.

Since you have written almost all of the dimension compute code, could you
help with the changes needed to bring it over? It would also be good to see
the user documentation in Malhar.

Thanks,
Thomas










On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <timothyfarkas@apache.org>
wrote:

> Hi Thomas,
>
> With respect to the dimension operator, I would like to learn more about
> the underlying framework you mentioned and the code duplication. If you are
> talking about the Window operator framework, that framework is not suitable
> for the dimension computation use case because it doesn't scale for large
> timebuckets. Furthermore that framework has no support for Querying. The
> dimension operators support live queries of the aggregated data. Querying
> of live data streams is a popular feature in other open source platforms,
> and I believe it is a worthwhile addition to Malhar.
>
> Given the fact that the dimension framework has been used in many POCs and
> is even running in production and has novel features like live querying, it
> more than meets the bar for a malhar contribution. If a concrete argument
> cannot be provided to prevent this work from going into Malhar, then these
> efforts should not be blocked.
>
> Thanks,
> Tim
>
> On 2016-09-09 17:18 (-0700), Thomas Weise <thomas@datatorrent.com> wrote:
> > I see no reason to move the dimension operator along with everything it
> > duplicates to Malhar. It's available to use for everyone as it is and
> there
> > should be an initiative to make it confirm to the underlying framework to
> > be part of Malhar.
> >
> > Also there is already an enrichment operator, there is even documentation
> > for it.
> >
> > Hence, this needs to be analyzed properly.
> >
> > Thomas
> >
> > On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <pramod@datatorrent.com>
> > wrote:
> >
> > > Yes, I do plan to come up with a proposal with a list. The ones that
> come
> > > to mind are flume, enrichment, various dimensional operators and any
> custom
> > > partitioners. The dimensional operators are in a mature state and
> usable
> > > today, in future they could also be ported onto the new windowing and
> > > managed state operator framework.
> > >
> > > Thanks
> > >
> > > On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <thomas@datatorrent.com>
> > > wrote:
> > >
> > > > A cursory look suggests there is a lot of overlap. I'm looking
> forward to
> > > > see a proposal that reflects a vision how to evolve Malhar rather
> than
> > > just
> > > > moving around code.
> > > >
> > > > Thomas
> > > >
> > > >
> > > > On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni <
> pramod@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > DataTorrent, the initial contributor to Apex and the company I work
> > > for,
> > > > > has opened up a library of operators called Megh recently to the
> public
> > > > and
> > > > > has made the repository available under the Apache License. The
> link to
> > > > the
> > > > > repository is below. These operators, for the most part, contain
> > > > > functionality that is complementary to what Malhar library
> provides and
> > > > > were developed to solve business use cases that arose over time.
> Also,
> > > > some
> > > > > operators in Malhar were inspired from early implementations in the
> > > Megh
> > > > > library and were built upon knowledge gained in doing the original
> > > > > implementations.
> > > > >
> > > > > Our goal is to not have Megh as a separate library but rather bring
> > > these
> > > > > operators into Malhar in a fashion that it is consistent with the
> > > Malhar
> > > > > project and repository. In the upcoming days, in a gradual
> fashion, we
> > > > will
> > > > > have more details on the individual operators that we would like
to
> > > > > contribute. Also, if you are interested in helping with this effort
> > > > please
> > > > > raise your hand.
> > > > >
> > > > > https://github.com/DataTorrent/Megh/
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message