apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: Modules
Date Tue, 03 Nov 2015 18:55:30 GMT
Classpath can be engineered

On Tue, Nov 3, 2015 at 10:44 AM, Vlad Rozov <v.rozov@datatorrent.com> wrote:

> Even in case when operator and module are not instantiated in the same
> container there will be two different versions on the classpath, so it is
> undefined which one will be used.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 10:29, Pramod Immaneni wrote:
>
>> Depends if the operator and module are container local or not.
>>
>> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v.rozov@datatorrent.com>
>> wrote:
>>
>> I think that just build tool will not provide necessary functionality for
>>> the second option. Module will have to implement it's own class loader
>>> otherwise behavior will be undefined in case where application embeds
>>> module and independently different version of an operator included into
>>> the
>>> module.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 11/3/15 10:10, Thomas Weise wrote:
>>>
>>> Both are valid options and to be solved through the build tool. In one
>>>> case
>>>> the dependency is declared, in the latter embedded.
>>>>
>>>> Thomas
>>>>
>>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v.rozov@datatorrent.com>
>>>> wrote:
>>>>
>>>> This brings one more issue that we did not cover in the module design
>>>>
>>>>> discussions.  What happens when new versions of operators embedded
>>>>> into a
>>>>> module become available? I believe we all go with the assumption that
>>>>> module will pickup version available on the classpath at run-time and
>>>>> it
>>>>> is
>>>>> operator developer responsibility to provide full binary compatibility.
>>>>> Another possible behavior is to consider module as completely
>>>>> independent
>>>>> unit and package all necessary libraries along with the module.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>>>
>>>>> I am not suggesting we leak the internals or compromise access
>>>>> modifiers.
>>>>>
>>>>>> I
>>>>>> want the module developer to have the ability (not mandatory) to
make
>>>>>> available all or a subset of the properties of an operator easily
if
>>>>>> they
>>>>>> desire without having to create setter/getter for each of them. You
>>>>>> don't
>>>>>> have to expose the operator they belong to. My preference would also
>>>>>> be
>>>>>> to
>>>>>> preserve the namespace of the properties in some way for example
by
>>>>>> grouping them by operator name. Think about scenario where people
have
>>>>>> built modules using kafka input operator and there is a new kafka
>>>>>> connection property. Without having this ability the modules have
to
>>>>>> be
>>>>>> changed to support this property. With this feature the module
>>>>>> developers
>>>>>> have a choice whether to keep the list of kafka properties fixed
in
>>>>>> the
>>>>>> module or allow new properties.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <thomas@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> There is also the option to inherit common module properties through
a
>>>>>>
>>>>>> base
>>>>>>> class.
>>>>>>>
>>>>>>> I don't see how this is any different from an operator. The developer
>>>>>>> decides what gets exposed and has the same options to control
it.
>>>>>>>
>>>>>>> Encapsulation is good practice, by leaking the module internals
the
>>>>>>> using
>>>>>>> code becomes brittle.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <amol@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> The same goes for a Java or C++ class that changes its api. In
>>>>>>> general
>>>>>>> this
>>>>>>>
>>>>>>> is left to the developer, and these languages have internals
as
>>>>>>> private
>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>> default for precisely the same purpose. The module developer
must
>>>>>>> have
>>>>>>>
>>>>>>>> the
>>>>>>>>
>>>>>>> right to change internals, keep api clean/constant and expect
user
>>>>>>> code
>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>> not break.
>>>>>>>
>>>>>>>> Thks,
>>>>>>>> Amol
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>>>> pramod@datatorrent.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For 3 and 4 can't we strike a balance between not having
to expose
>>>>>>>> the
>>>>>>>>
>>>>>>>> operators underneath and at the same time not having to write
>>>>>>>>>
>>>>>>>>> boilerplate
>>>>>>>>>
>>>>>>>> code for all the properties that the module wants to make
available
>>>>>>>>
>>>>>>>> outside. It can quickly become unmanageable. For example,
an input
>>>>>>>>>
>>>>>>>>> operator
>>>>>>>>>
>>>>>>>> has a new connection property which can be used outside and
now all
>>>>>>>>
>>>>>>>>> the
>>>>>>>>> modules using that operator, their code has to be modified
to just
>>>>>>>>> add
>>>>>>>>>
>>>>>>>>> a
>>>>>>>>>
>>>>>>>> pass through setter/getter. How about treating the operator
name as
>>>>>>>> a
>>>>>>>> group
>>>>>>>>
>>>>>>>> name and ability for module developers to easily make
>>>>>>>>
>>>>>>>>> available/specify
>>>>>>>>>
>>>>>>>>> all
>>>>>>>>>
>>>>>>>> or a subset of the properties of an operator to the user
without
>>>>>>>>
>>>>>>>>> having
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> explicitly make each of them a module property.
>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <amol@datatorrent.com>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> 3,4 should follow conventions where the creator decides the
api
>>>>>>>>
>>>>>>>> (including
>>>>>>>>>
>>>>>>>>> accessibility). In general only those properties exposed
by module
>>>>>>>>>
>>>>>>>>>> creator
>>>>>>>>>>
>>>>>>>>> should be settable. What the module internally does with
them is
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>
>>>>>>>>> designer's call. Accessing internals of module from outside
is
>>>>>>>>
>>>>>>>> uncommon.
>>>>>>>>> For exampe in Java (or C++) private fields/members are
not to be
>>>>>>>>> accessed.
>>>>>>>>>
>>>>>>>>> Properties (setter and getter) are the api that module
designer
>>>>>>>>> gives
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> the module user. It is dangerous and has unintended consequences
if
>>>>>>>>> module
>>>>>>>>>
>>>>>>>>> user starts to access internals outside the api.
>>>>>>>>>
>>>>>>>>>> Partitioning should be next phase. As long as current
design does
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> halt
>>>>>>>>>>
>>>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>>
>>>>>>>>>> Thks,
>>>>>>>>>> Amol
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
>>>>>>>>>> v.rozov@datatorrent.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG()
it is
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> design and is the current pull request implementation
details.
>>>>>>>>>
>>>>>>>>>> 2. While I agree that both Module and StreamingApplication
let's
>>>>>>>>>>> module/application designer to expose DAG design
reuse pattern
>>>>>>>>>>> and
>>>>>>>>>>> StreamingApplication interface may be extending
Module, it does
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> seem
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>> buy us much. Do we want to allow certain applications
to be reused
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>
>>>>>>>>>> Modules in other applications or should application
package be
>>>>>>>>> different
>>>>>>>>>
>>>>>>>>>> from Module package? The current approach is to distribute
Modules
>>>>>>>>>> as
>>>>>>>>>>
>>>>>>>>>> part
>>>>>>>>> of .jar for example as part of Malhar library without
necessarily
>>>>>>>>>
>>>>>>>>>> providing
>>>>>>>>>>>
>>>>>>>>>> all necessary dependencies. Application package on
other side must
>>>>>>>>>>
>>>>>>>>>>> include
>>>>>>>>>>>
>>>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>>
>>>>>>>>>>> 3, 4. While this will help Module designer, it
may complicate
>>>>>>>>>>>
>>>>>>>>>>> Module
>>>>>>>>>>>
>>>>>>>>>> maintenance and how Modules are used. What if Module
designer
>>>>>>>>> wants
>>>>>>>>> to
>>>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> another operator? Does StreamingApplication designer
need to know
>>>>>>>>>
>>>>>>>>> internal
>>>>>>>>>>
>>>>>>>>>> structure of Modules? Should Module be considered
as a black box
>>>>>>>>>>
>>>>>>>>>>> during
>>>>>>>>>>>
>>>>>>>>>> Application design time as it was initially planned?
>>>>>>>>>
>>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>>
>>>>>>>>>>> functionality
>>>>>>>>>>>
>>>>>>>>>> the way I understand it.
>>>>>>>>>
>>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>>
>>>>>>>>>>> partitioning.
>>>>>>>>>>>
>>>>>>>>>> One
>>>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>>>
>>>>>>>>>> partitioning
>>>>>>>>>>
>>>>>>>>>> scheme.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>>> Vlad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>>>
>>>>>>>>>>> I have some comments and suggestions on the module
design. I
>>>>>>>>>>> think
>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>>> need to be taken into account before we can merge
the
>>>>>>>>>>
>>>>>>>>>> implementation
>>>>>>>>>>>
>>>>>>>>>>> provided below into the mainline code. I apologize
if these
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>> have
>>>>>>>>>
>>>>>>>>>> been brought up earlier as for some reason or the
other I was out
>>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>> loop
>>>>>>>>>>
>>>>>>>>> on this one
>>>>>>>>>
>>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>>>> <
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>>
>>>>>>>         1. DAG scoping currently in the implementation is global
for
>>>>>>>
>>>>>>>> modules,
>>>>>>>>>
>>>>>>>>>> each module's populateDAG sees the entire DAG. It
should be
>>>>>>>>>>> locally
>>>>>>>>>>>
>>>>>>>>>>> scoped
>>>>>>>>>>
>>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>
>>>>>>>>>>         2. The module has a populateDAG method with
exact same
>>>>>>>>>>>> syntax
>>>>>>>>>>>>
>>>>>>>>>>>> as
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> StreamingApplication. Is StreamingApplication also a
module,
>>>>>>>>>
>>>>>>>>>> should
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> extend that interface.
>>>>>>>>
>>>>>>>>>         3. Setting properties for modules is too verbose.
Module
>>>>>>>>>>
>>>>>>>>>>> developer
>>>>>>>>>>>>
>>>>>>>>>>> needs to repeat every property they want exposed
with a setter
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> getter
>>>>>>>>>>> in JAVA. I don't disagree that module developer
should be able to
>>>>>>>>>>> choose
>>>>>>>>>>>
>>>>>>>>>>> which properties from which operators need to
be exposed but the
>>>>>>>>>>
>>>>>>>>>> current
>>>>>>>>>>>
>>>>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>>
>>>>>>>>>>              a. Allow modules to specify which operators
and
>>>>>>>>>>>
>>>>>>>>>>>> properties
>>>>>>>>>>>>
>>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible from outside. One way is in the "populateDAG"
method of
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>> module when adding the operator have the ability
to specify if
>>>>>>>>>> this
>>>>>>>>>> operator can be accessible from outside and which
or all
>>>>>>>>>>
>>>>>>>>> properties
>>>>>>>>>
>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible.
>>>>>>>>>>
>>>>>>>>>>>              b. Provide methods in ModuleMeta
or elsewhere to set
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> values by specifying the operator name (friendly
name) inside the
>>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>
>>>>>>>>>>> and property name. If this is allowed by a. above
it is
>>>>>>>>>> successful
>>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>> should fail.
>>>>>>>>>>
>>>>>>>>>>>              c. Allow a syntax in property files
to specify the
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> b.
>>>>>>>>>
>>>>>>>>>> Example syntax
>>>>>>>>>>>
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>>>>>
>>>>>>>>>>> propname>
>>>>>>>>>>
>>>>>>>>>         4. For attributes same mechanism as in 3 should
apply for
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> operators
>>>>>>>>>>>> that are exposed by the module.  For property
file, example
>>>>>>>>>>>> syntax
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>>>         5. Module developers in addition
to 3. and 4. above may
>>>>>>>>>>>>
>>>>>>>>>>>> choose
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> support module level properties and attributes. These
should not
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> default when 3. and 4. are possible but complementary,
in addition
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> them.
>>>>>>>>>> In this case for properties they can implement setters
and getters
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>> module. For attributes the user should still be able
to set the
>>>>>>>>>>
>>>>>>>>>>> attributes
>>>>>>>>>>> using the dag setAttribute method. You could
introduce a method
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> module to process attributes that can get called
by the engine
>>>>>>>>>> once
>>>>>>>>>> everything is set.
>>>>>>>>>>
>>>>>>>>>         6. For 5. above setting global properties and
attributes
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> akin to ideas that have been proposed for the application
as
>>>>>>>>>>>
>>>>>>>>>>>> well. A
>>>>>>>>>>>>
>>>>>>>>>>> consistent way must be possible for applications
as well even if
>>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>> not
>>>>>>>>
>>>>>>>>> implemented now.
>>>>>>>>>>
>>>>>>>>>>>         7. For 5. or 6. above there should be
a property file way
>>>>>>>>>>>> of
>>>>>>>>>>>> specifying
>>>>>>>>>>>> the global module properties and attributes.
Example syntax
>>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>>>> Notice the difference with 3. c. and 4 above
that there is no
>>>>>>>>>>>>
>>>>>>>>>>>> operator
>>>>>>>>>>>>
>>>>>>>>>>> keyword here.
>>>>>>>>>>         8. Partitioning needs to be consistent with
what the user
>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>>
>>>>>>>>>>> expect
>>>>>>>>>>
>>>>>>>>> when they see module as an entity. I will send an image
of
>>>>>>>>>
>>>>>>>>>> possible
>>>>>>>>>>>
>>>>>>>>>>> examples of how the user will expect the physical
plan to look in
>>>>>>>>>>
>>>>>>>>> certain
>>>>>>>>>
>>>>>>>>>> cases.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message