apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: Megh operator library
Date Sat, 10 Sep 2016 14:20:03 GMT
It would be great to have Tim's help with dimension computation but I
think we can still debate whether HDHT dependency needs to be removed
before contribution or whether it can be done as a two step process
since we also have a place to put experimental code contrib and HDHT
could go in there till we can determine/port it to use managed. state.

My thought on this is that if it is going to be a significant porting
effort then we do it as a two step process.


> On Sep 9, 2016, at 11:52 PM, Thomas Weise <thomas@datatorrent.com> wrote:
> Tim,
> The functionality of the dimension compute operator should be available in
> Malhar. My concern is moving things without regard to code duplication and
> long term maintenance cost. There are several pieces to the dimension
> compute operator that in fact are (or should be) reusable components by
> themselves. Live querying (queryable state) with schemas is one such
> example. It's a major feature and not limited to the dimension compute
> operator. It should ideally work with the new windowing support as well.
> But the main area that needs work is the state store - the dependency on
> HDHT needs to be removed and replaced with managed state. Also I'm curious
> why the window operator should not scale for large time buckets? Are you
> referring to the current intermediate implementation or the work in
> progress that will use incremental state saving? If so, please bring it up
> on APEXMALHAR-2130 as it is pretty important.
> Since you have written almost all of the dimension compute code, could you
> help with the changes needed to bring it over? It would also be good to see
> the user documentation in Malhar.
> Thanks,
> Thomas
> On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <timothyfarkas@apache.org>
> wrote:
>> Hi Thomas,
>> With respect to the dimension operator, I would like to learn more about
>> the underlying framework you mentioned and the code duplication. If you are
>> talking about the Window operator framework, that framework is not suitable
>> for the dimension computation use case because it doesn't scale for large
>> timebuckets. Furthermore that framework has no support for Querying. The
>> dimension operators support live queries of the aggregated data. Querying
>> of live data streams is a popular feature in other open source platforms,
>> and I believe it is a worthwhile addition to Malhar.
>> Given the fact that the dimension framework has been used in many POCs and
>> is even running in production and has novel features like live querying, it
>> more than meets the bar for a malhar contribution. If a concrete argument
>> cannot be provided to prevent this work from going into Malhar, then these
>> efforts should not be blocked.
>> Thanks,
>> Tim
>>> On 2016-09-09 17:18 (-0700), Thomas Weise <thomas@datatorrent.com> wrote:
>>> I see no reason to move the dimension operator along with everything it
>>> duplicates to Malhar. It's available to use for everyone as it is and
>> there
>>> should be an initiative to make it confirm to the underlying framework to
>>> be part of Malhar.
>>> Also there is already an enrichment operator, there is even documentation
>>> for it.
>>> Hence, this needs to be analyzed properly.
>>> Thomas
>>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <pramod@datatorrent.com>
>>> wrote:
>>>> Yes, I do plan to come up with a proposal with a list. The ones that
>> come
>>>> to mind are flume, enrichment, various dimensional operators and any
>> custom
>>>> partitioners. The dimensional operators are in a mature state and
>> usable
>>>> today, in future they could also be ported onto the new windowing and
>>>> managed state operator framework.
>>>> Thanks
>>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <thomas@datatorrent.com>
>>>> wrote:
>>>>> A cursory look suggests there is a lot of overlap. I'm looking
>> forward to
>>>>> see a proposal that reflects a vision how to evolve Malhar rather
>> than
>>>> just
>>>>> moving around code.
>>>>> Thomas
>>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni <
>> pramod@datatorrent.com>
>>>>> wrote:
>>>>>> Hi,
>>>>>> DataTorrent, the initial contributor to Apex and the company I work
>>>> for,
>>>>>> has opened up a library of operators called Megh recently to the
>> public
>>>>> and
>>>>>> has made the repository available under the Apache License. The
>> link to
>>>>> the
>>>>>> repository is below. These operators, for the most part, contain
>>>>>> functionality that is complementary to what Malhar library
>> provides and
>>>>>> were developed to solve business use cases that arose over time.
>> Also,
>>>>> some
>>>>>> operators in Malhar were inspired from early implementations in the
>>>> Megh
>>>>>> library and were built upon knowledge gained in doing the original
>>>>>> implementations.
>>>>>> Our goal is to not have Megh as a separate library but rather bring
>>>> these
>>>>>> operators into Malhar in a fashion that it is consistent with the
>>>> Malhar
>>>>>> project and repository. In the upcoming days, in a gradual
>> fashion, we
>>>>> will
>>>>>> have more details on the individual operators that we would like
>>>>>> contribute. Also, if you are interested in helping with this effort
>>>>> please
>>>>>> raise your hand.
>>>>>> https://github.com/DataTorrent/Megh/
>>>>>> Thanks

View raw message