apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: Megh operator library
Date Tue, 27 Sep 2016 05:22:58 GMT
Added a section for flume based on the feedback.

Thanks

On Mon, Sep 26, 2016 at 8:51 AM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> Hi Thomas,
>
> My responses are inline
>
> On Sun, Sep 25, 2016 at 11:39 AM, Thomas Weise <thomas.weise@gmail.com>
> wrote:
>
>> Thanks for putting it together. It looks like there are really only 2
>> operators?
>>
>
> There were others but looked like they were already good implementations
> or alternatives for it in Malhar. For example, enrichment and deduper have
> implementations already, for laggards operator looked like the concept is
> already covered in the new windowing work.
>
>
>>
>> +1 for the Flume connector. It would be good to also look what has changed
>> in Flume since it was written. It needs its own Maven module and
>> documentation is also needed.
>>
>
> Yes in the table in the document I have it going to its own module and
> path. Will make a note in the document about checking against newer flume
> versions and documentation.
>
>
>> I don't agree with the proposed "as-is" move for the dimension compute
>> operator into contrib. It does not belong there. Contrib is for new,
>> incomplete work ("immature" and under the radar WRT CI etc.), with
>> particular focus to provide an easier entry path for new contributors.
>>
>> I would like to see the following changes to dimension computation:
>> * Replace HDHT with managed state (or spillable DS)
>> * Move to org.apache.apex.malhar.lib.*
>> * Documentation (your draft is a good start towards that), it also needs
>> to
>> cover query support.
>>
>> I think it is a very valuable operator that should be a first class
>> citizen
>> and the folks familiar with the operator and state management should take
>> up the work to port it. Tim indicated he may be able to take it up.
>>
>> In the meantime, the operator can remain in the Megh repository under
>> existing name and consumed from there.
>>
>
> I thought it could eventually have its own module under Malhar but
> suggested contrib as an intermediate location till any porting is
> completed. I agree with the documentation, I just wrote up something quick
> to highlight the operator, Tim has more detailed docs for it I think. Since
> the operator(s) are readily usable in production applications, implement
> quite a bit of functionality and provide valuable functionality, I am of
> the opinion that we do the minimal now to make it available and parallely
> start the work on porting some of the internal subsystems to newer
> components.
>
> Thanks
>
>
>>
>> Thomas
>>
>> On Sat, Sep 24, 2016 at 12:29 PM, Pramod Immaneni <pramod@datatorrent.com
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > Here is the initial proposal. Please go through it and you can comment
>> > right on the document. Regarding the discussions around Dimensional
>> > operators, there is a specific section for it and future plans. After
>> the
>> > comments are addressed, I can start with one of the components such as
>> > flume and document the steps involved. Then others can take up the other
>> > components and use the steps in a similar fashion.
>> >
>> > https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5
>> > nTP7cUQOAlVs0g
>> >
>> > Thanks
>> >
>> > On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <amol@datatorrent.com>
>> wrote:
>> >
>> > > Thomas,
>> > > IMHO we should also look at the cost to users on keeping code in a
>> github
>> > > (even if under ASF 2.0 license) outside Malhar. There is value to
>> > > deprecating code in Megh, and moving it to Malhar. Volunteers in this
>> > > effort could decide on how much overlap means "mark as overlapping",
>> My
>> > > suggesstion is to absorb overlapping operators into a directory in
>> Malhar
>> > > that marks it as such. A lot of these operators are being used in
>> > > production and it make sense to absorb them into Apache gitHub.
>> > >
>> > > Thks
>> > > Amol
>> > >
>> > >
>> > >
>> > >
>> > > On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <
>> pramod@datatorrent.com
>> > >
>> > > wrote:
>> > >
>> > > > It would be great to have Tim's help with dimension computation but
>> I
>> > > > think we can still debate whether HDHT dependency needs to be
>> removed
>> > > > before contribution or whether it can be done as a two step process
>> > > > since we also have a place to put experimental code contrib and HDHT
>> > > > could go in there till we can determine/port it to use managed.
>> state.
>> > > >
>> > > > My thought on this is that if it is going to be a significant
>> porting
>> > > > effort then we do it as a two step process.
>> > > >
>> > > > Thanks
>> > > >
>> > > > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <thomas@datatorrent.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > Tim,
>> > > > >
>> > > > > The functionality of the dimension compute operator should be
>> > available
>> > > > in
>> > > > > Malhar. My concern is moving things without regard to code
>> > duplication
>> > > > and
>> > > > > long term maintenance cost. There are several pieces to the
>> dimension
>> > > > > compute operator that in fact are (or should be) reusable
>> components
>> > by
>> > > > > themselves. Live querying (queryable state) with schemas is one
>> such
>> > > > > example. It's a major feature and not limited to the dimension
>> > compute
>> > > > > operator. It should ideally work with the new windowing support
as
>> > > well.
>> > > > > But the main area that needs work is the state store - the
>> dependency
>> > > on
>> > > > > HDHT needs to be removed and replaced with managed state. Also
I'm
>> > > > curious
>> > > > > why the window operator should not scale for large time buckets?
>> Are
>> > > you
>> > > > > referring to the current intermediate implementation or the work
>> in
>> > > > > progress that will use incremental state saving? If so, please
>> bring
>> > it
>> > > > up
>> > > > > on APEXMALHAR-2130 as it is pretty important.
>> > > > >
>> > > > > Since you have written almost all of the dimension compute code,
>> > could
>> > > > you
>> > > > > help with the changes needed to bring it over? It would also
be
>> good
>> > to
>> > > > see
>> > > > > the user documentation in Malhar.
>> > > > >
>> > > > > Thanks,
>> > > > > Thomas
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <
>> > > > timothyfarkas@apache.org>
>> > > > > wrote:
>> > > > >
>> > > > >> Hi Thomas,
>> > > > >>
>> > > > >> With respect to the dimension operator, I would like to learn
>> more
>> > > about
>> > > > >> the underlying framework you mentioned and the code duplication.
>> If
>> > > you
>> > > > are
>> > > > >> talking about the Window operator framework, that framework
is
>> not
>> > > > suitable
>> > > > >> for the dimension computation use case because it doesn't
scale
>> for
>> > > > large
>> > > > >> timebuckets. Furthermore that framework has no support for
>> Querying.
>> > > The
>> > > > >> dimension operators support live queries of the aggregated
data.
>> > > > Querying
>> > > > >> of live data streams is a popular feature in other open source
>> > > > platforms,
>> > > > >> and I believe it is a worthwhile addition to Malhar.
>> > > > >>
>> > > > >> Given the fact that the dimension framework has been used
in many
>> > POCs
>> > > > and
>> > > > >> is even running in production and has novel features like
live
>> > > > querying, it
>> > > > >> more than meets the bar for a malhar contribution. If a concrete
>> > > > argument
>> > > > >> cannot be provided to prevent this work from going into Malhar,
>> then
>> > > > these
>> > > > >> efforts should not be blocked.
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Tim
>> > > > >>
>> > > > >>> On 2016-09-09 17:18 (-0700), Thomas Weise <
>> thomas@datatorrent.com>
>> > > > wrote:
>> > > > >>> I see no reason to move the dimension operator along
with
>> > everything
>> > > it
>> > > > >>> duplicates to Malhar. It's available to use for everyone
as it
>> is
>> > and
>> > > > >> there
>> > > > >>> should be an initiative to make it confirm to the underlying
>> > > framework
>> > > > to
>> > > > >>> be part of Malhar.
>> > > > >>>
>> > > > >>> Also there is already an enrichment operator, there is
even
>> > > > documentation
>> > > > >>> for it.
>> > > > >>>
>> > > > >>> Hence, this needs to be analyzed properly.
>> > > > >>>
>> > > > >>> Thomas
>> > > > >>>
>> > > > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <
>> > > > pramod@datatorrent.com>
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>>> Yes, I do plan to come up with a proposal with a
list. The ones
>> > that
>> > > > >> come
>> > > > >>>> to mind are flume, enrichment, various dimensional
operators
>> and
>> > any
>> > > > >> custom
>> > > > >>>> partitioners. The dimensional operators are in a
mature state
>> and
>> > > > >> usable
>> > > > >>>> today, in future they could also be ported onto the
new
>> windowing
>> > > and
>> > > > >>>> managed state operator framework.
>> > > > >>>>
>> > > > >>>> Thanks
>> > > > >>>>
>> > > > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <
>> > > thomas@datatorrent.com>
>> > > > >>>> wrote:
>> > > > >>>>
>> > > > >>>>> A cursory look suggests there is a lot of overlap.
I'm looking
>> > > > >> forward to
>> > > > >>>>> see a proposal that reflects a vision how to
evolve Malhar
>> rather
>> > > > >> than
>> > > > >>>> just
>> > > > >>>>> moving around code.
>> > > > >>>>>
>> > > > >>>>> Thomas
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni
<
>> > > > >> pramod@datatorrent.com>
>> > > > >>>>> wrote:
>> > > > >>>>>
>> > > > >>>>>> Hi,
>> > > > >>>>>>
>> > > > >>>>>> DataTorrent, the initial contributor to Apex
and the company
>> I
>> > > work
>> > > > >>>> for,
>> > > > >>>>>> has opened up a library of operators called
Megh recently to
>> the
>> > > > >> public
>> > > > >>>>> and
>> > > > >>>>>> has made the repository available under the
Apache License.
>> The
>> > > > >> link to
>> > > > >>>>> the
>> > > > >>>>>> repository is below. These operators, for
the most part,
>> contain
>> > > > >>>>>> functionality that is complementary to what
Malhar library
>> > > > >> provides and
>> > > > >>>>>> were developed to solve business use cases
that arose over
>> time.
>> > > > >> Also,
>> > > > >>>>> some
>> > > > >>>>>> operators in Malhar were inspired from early
implementations
>> in
>> > > the
>> > > > >>>> Megh
>> > > > >>>>>> library and were built upon knowledge gained
in doing the
>> > original
>> > > > >>>>>> implementations.
>> > > > >>>>>>
>> > > > >>>>>> Our goal is to not have Megh as a separate
library but rather
>> > > bring
>> > > > >>>> these
>> > > > >>>>>> operators into Malhar in a fashion that it
is consistent with
>> > the
>> > > > >>>> Malhar
>> > > > >>>>>> project and repository. In the upcoming days,
in a gradual
>> > > > >> fashion, we
>> > > > >>>>> will
>> > > > >>>>>> have more details on the individual operators
that we would
>> like
>> > > to
>> > > > >>>>>> contribute. Also, if you are interested in
helping with this
>> > > effort
>> > > > >>>>> please
>> > > > >>>>>> raise your hand.
>> > > > >>>>>>
>> > > > >>>>>> https://github.com/DataTorrent/Megh/
>> > > > >>>>>>
>> > > > >>>>>> Thanks
>> > > > >>
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message