apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vlad Rozov <v.ro...@datatorrent.com>
Subject Re: Modules
Date Tue, 03 Nov 2015 18:26:19 GMT
I think that just build tool will not provide necessary functionality 
for the second option. Module will have to implement it's own class 
loader otherwise behavior will be undefined in case where application 
embeds module and independently different version of an operator 
included into the module.

Thank you,

Vlad

On 11/3/15 10:10, Thomas Weise wrote:
> Both are valid options and to be solved through the build tool. In one case
> the dependency is declared, in the latter embedded.
>
> Thomas
>
> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v.rozov@datatorrent.com> wrote:
>
>> This brings one more issue that we did not cover in the module design
>> discussions.  What happens when new versions of operators embedded into a
>> module become available? I believe we all go with the assumption that
>> module will pickup version available on the classpath at run-time and it is
>> operator developer responsibility to provide full binary compatibility.
>> Another possible behavior is to consider module as completely independent
>> unit and package all necessary libraries along with the module.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>
>>> I am not suggesting we leak the internals or compromise access modifiers.
>>> I
>>> want the module developer to have the ability (not mandatory) to make
>>> available all or a subset of the properties of an operator easily if they
>>> desire without having to create setter/getter for each of them. You don't
>>> have to expose the operator they belong to. My preference would also be to
>>> preserve the namespace of the properties in some way for example by
>>> grouping them by operator name. Think about scenario where people have
>>> built modules using kafka input operator and there is a new kafka
>>> connection property. Without having this ability the modules have to be
>>> changed to support this property. With this feature the module developers
>>> have a choice whether to keep the list of kafka properties fixed in the
>>> module or allow new properties.
>>>
>>> Thanks
>>>
>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <thomas@datatorrent.com>
>>> wrote:
>>>
>>> There is also the option to inherit common module properties through a
>>>> base
>>>> class.
>>>>
>>>> I don't see how this is any different from an operator. The developer
>>>> decides what gets exposed and has the same options to control it.
>>>>
>>>> Encapsulation is good practice, by leaking the module internals the using
>>>> code becomes brittle.
>>>>
>>>> Thomas
>>>>
>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <amol@datatorrent.com> wrote:
>>>>
>>>> The same goes for a Java or C++ class that changes its api. In general
>>>> this
>>>>
>>>>> is left to the developer, and these languages have internals as private
>>>>>
>>>> by
>>>>
>>>>> default for precisely the same purpose. The module developer must have
>>>>>
>>>> the
>>>>
>>>>> right to change internals, keep api clean/constant and expect user code
>>>>>
>>>> to
>>>>
>>>>> not break.
>>>>>
>>>>> Thks,
>>>>> Amol
>>>>>
>>>>>
>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pramod@datatorrent.com
>>>>> wrote:
>>>>>
>>>>> For 3 and 4 can't we strike a balance between not having to expose the
>>>>>> operators underneath and at the same time not having to write
>>>>>>
>>>>> boilerplate
>>>>> code for all the properties that the module wants to make available
>>>>>> outside. It can quickly become unmanageable. For example, an input
>>>>>>
>>>>> operator
>>>>>
>>>>>> has a new connection property which can be used outside and now all
the
>>>>>> modules using that operator, their code has to be modified to just
add
>>>>>>
>>>>> a
>>>>> pass through setter/getter. How about treating the operator name as a
>>>>> group
>>>>>
>>>>>> name and ability for module developers to easily make available/specify
>>>>>>
>>>>> all
>>>>>
>>>>>> or a subset of the properties of an operator to the user without
having
>>>>>>
>>>>> to
>>>>>
>>>>>> explicitly make each of them a module property.
>>>>>>
>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <amol@datatorrent.com>
>>>>>>
>>>>> wrote:
>>>>> 3,4 should follow conventions where the creator decides the api
>>>>>> (including
>>>>>>
>>>>>>> accessibility). In general only those properties exposed by module
>>>>>>>
>>>>>> creator
>>>>>>
>>>>>>> should be settable. What the module internally does with them
is
>>>>>>>
>>>>>> module
>>>>> designer's call. Accessing internals of module from outside is
>>>>>> uncommon.
>>>>>> For exampe in Java (or C++) private fields/members are not to be
>>>>>> accessed.
>>>>>>
>>>>>>> Properties (setter and getter) are the api that module designer
gives
>>>>>>>
>>>>>> to
>>>>>> the module user. It is dangerous and has unintended consequences
if
>>>>>> module
>>>>>>
>>>>>>> user starts to access internals outside the api.
>>>>>>>
>>>>>>> Partitioning should be next phase. As long as current design
does not
>>>>>>>
>>>>>> halt
>>>>>>
>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>
>>>>>>> Thks,
>>>>>>> Amol
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v.rozov@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 1. +1, though passing original DAG to module's populateDAG()
it is
>>>>>>> not
>>>>>> by
>>>>>>
>>>>>>> design and is the current pull request implementation details.
>>>>>>>> 2. While I agree that both Module and StreamingApplication
let's
>>>>>>>> module/application designer to expose DAG design reuse pattern
and
>>>>>>>> StreamingApplication interface may be extending Module, it
does not
>>>>>>>>
>>>>>>> seem
>>>>>>> to
>>>>>>>
>>>>>>>> buy us much. Do we want to allow certain applications to
be reused
>>>>>>>>
>>>>>>> as
>>>>> Modules in other applications or should application package be
>>>>>>> different
>>>>>>> from Module package? The current approach is to distribute Modules
>>>>>>> as
>>>>> part
>>>>>>>> of .jar for example as part of Malhar library without necessarily
>>>>>>>>
>>>>>>> providing
>>>>>>>
>>>>>>>> all necessary dependencies. Application package on other
side must
>>>>>>>>
>>>>>>> include
>>>>>>>
>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>
>>>>>>>> 3, 4. While this will help Module designer, it may complicate
>>>>>>>>
>>>>>>> Module
>>>>> maintenance and how Modules are used. What if Module designer wants
>>>>>>> to
>>>>>> change it's implementation and replace one operator implementation
>>>>>>> with
>>>>>> another operator? Does StreamingApplication designer need to know
>>>>>>> internal
>>>>>>>
>>>>>>>> structure of Modules? Should Module be considered as a black
box
>>>>>>>>
>>>>>>> during
>>>>>> Application design time as it was initially planned?
>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>
>>>>>>> functionality
>>>>>> the way I understand it.
>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>
>>>>>>> partitioning.
>>>>> One
>>>>>>> of supported cases should be ability to specify cascading
>>>>>>> partitioning
>>>>>> scheme.
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>
>>>>>>>> I have some comments and suggestions on the module design.
I think
>>>>>>>> these
>>>>>>> need to be taken into account before we can merge the
>>>>>>>> implementation
>>>>> provided below into the mainline code. I apologize if these should
>>>>>>>> have
>>>>>>> been brought up earlier as for some reason or the other I was
out
>>>>>>>> of
>>>>> loop
>>>>>>>> on this one
>>>>>>>>>        https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>>
>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>
>>>>>        1. DAG scoping currently in the implementation is global for
>>>>>>>> modules,
>>>>>>>> each module's populateDAG sees the entire DAG. It should
be
>>>>>>>> locally
>>>>> scoped
>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>        2. The module has a populateDAG method with exact
same syntax
>>>>>>>>>
>>>>>>>> as
>>>>>> in
>>>>>>
>>>>>>> StreamingApplication. Is StreamingApplication also a module,
>>>>>>>> should
>>>>> it
>>>>>
>>>>>> extend that interface.
>>>>>>>>>        3. Setting properties for modules is too verbose.
Module
>>>>>>>>>
>>>>>>>> developer
>>>>>>> needs to repeat every property they want exposed with a setter
and
>>>>>>>> getter
>>>>>>>> in JAVA. I don't disagree that module developer should be
able to
>>>>>>>> choose
>>>>>>> which properties from which operators need to be exposed but
the
>>>>>>>> current
>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>             a. Allow modules to specify which operators
and
>>>>>>>>>
>>>>>>>> properties
>>>>> can
>>>>>>> be
>>>>>>>>> accessible from outside. One way is in the "populateDAG"
method of
>>>>>>>>>
>>>>>>>> the
>>>>>> module when adding the operator have the ability to specify if
>>>>>>>> this
>>>>> operator can be accessible from outside and which or all
>>>>>>>> properties
>>>>> can
>>>>>>> be
>>>>>>>
>>>>>>>> accessible.
>>>>>>>>>             b. Provide methods in ModuleMeta or elsewhere
to set
>>>>>>>>>
>>>>>>>> property
>>>>>>> values by specifying the operator name (friendly name) inside
the
>>>>>>>> module
>>>>>>> and property name. If this is allowed by a. above it is successful
>>>>>>>> else
>>>>>>> it
>>>>>>>
>>>>>>>> should fail.
>>>>>>>>>             c. Allow a syntax in property files to specify
the
>>>>>>>>>
>>>>>>>> property
>>>>>> in
>>>>>>
>>>>>>> b.
>>>>>>>>> Example syntax
>>>>>>>>>
>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>> propname>
>>>>>>>>>        4. For attributes same mechanism as in 3 should
apply for the
>>>>>>>>> operators
>>>>>>>>> that are exposed by the module.  For property file, example
syntax
>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>        5. Module developers in addition to 3. and 4.
above may
>>>>>>>>>
>>>>>>>> choose
>>>>> to
>>>>>
>>>>>> support module level properties and attributes. These should not
>>>>>>>> be
>>>>> the
>>>>>>> default when 3. and 4. are possible but complementary, in addition
>>>>>>>> to
>>>>>> them.
>>>>>>>>> In this case for properties they can implement setters
and getters
>>>>>>>>>
>>>>>>>> in
>>>>>> the
>>>>>>>> module. For attributes the user should still be able to set
the
>>>>>>>> attributes
>>>>>>>> using the dag setAttribute method. You could introduce a
method in
>>>>>>>> the
>>>>>> module to process attributes that can get called by the engine
>>>>>>>> once
>>>>> everything is set.
>>>>>>>>>        6. For 5. above setting global properties and
attributes for
>>>>>>>>>
>>>>>>>> module
>>>>>>> is
>>>>>>>>> akin to ideas that have been proposed for the application
as
>>>>>>>>>
>>>>>>>> well. A
>>>>> consistent way must be possible for applications as well even if
>>>>>>>> it
>>>>> is
>>>>>
>>>>>> not
>>>>>>>> implemented now.
>>>>>>>>>        7. For 5. or 6. above there should be a property
file way of
>>>>>>>>> specifying
>>>>>>>>> the global module properties and attributes. Example
syntax
>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>> Notice the difference with 3. c. and 4 above that there
is no
>>>>>>>>>
>>>>>>>> operator
>>>>>> keyword here.
>>>>>>>>>        8. Partitioning needs to be consistent with what
the user
>>>>>>>>>
>>>>>>>> will
>>>>> expect
>>>>>>>> when they see module as an entity. I will send an image of
>>>>>>>> possible
>>>>> examples of how the user will expect the physical plan to look in
>>>>>>>> certain
>>>>>>>> cases.
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>


Mime
View raw message