apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinmay Kolhatkar <chin...@datatorrent.com>
Subject Re: Modules
Date Fri, 06 Nov 2015 07:19:57 GMT
Hi All,

As requested, I've updated the pull request for locally scoping the DAG
object. i.e. DAG object passed to populateDAG of module will not be the top
level application DAG object. It will be specific to the module.
Basically, the encapsulation of the modules is maintained now.

Here is the design with respect to that:
1) ModuleMeta object is added with a new variable for LogicalPlan. This DAG
(LogicalPlan) object is supposed to show the logical plan of the specific
module.
2) ModuleMeta object is also added with a new method called as flattenDAG.
This method does following:
   a. Call populateDAG of current module. Here the DAG object passed is the
one present in ModuleMeta.
   b. Calling populateDAG can possibly add more modules in this module's
DAG. Hence iteratively the flattenDAG will be called on submodules.
   c. Then a method called "applyStreamLinks" will be called on current
module to resolve the module ports to operator ports.
   d. Finally, the current module will be added to its parent (can be
another module OR application)

Pseudo Code looks like following:
LogicalPlan::ModuleMeta {
  private Module module;
  private LogicalPlan dag;

  flattenModule(parentDAG, conf)
  {
    module.populateDAG(dag, conf);
    forall (dag.getAllModules())
    {
      subModuleMeta.flattenDAG(dag, conf);
    }
    dag.applyStreamLinks();
    parentDAG.addDAGToCurrentDAG(dag);
  }
}


LogicalPlanConfiguration::prepareDAG()
{
  app.populateDAG();
  forall(allDag.getALlModules())
  {
    moduleMeta.flattenDAG();
  }
  appDAG.applyStreamLinks();
}

Please review the pull request:
https://github.com/apache/incubator-apex-core/pull/148

Thanks,
Chinmay.


~ Chinmay.

On Wed, Nov 4, 2015 at 2:24 AM, Vlad Rozov <v.rozov@datatorrent.com> wrote:

> It is static vs dynamic binding. With static binding it is much easier to
> certify module behavior. With dynamic binding at run-time module behavior
> depends on *correct* implementation of semantic versioning by embedded
> operators. An operator may introduce new mandatory property and such change
> will not be catch during semantic version verification check, but will
> break module behavior.
>
> Who will be responsible for shading the dependencies? Is it module
> designer (not clear why) or application designer? Should the functionality
> be provided by the Apache Apex platform, not by module/application
> designers?
>
> I suggest that we support the first option only for now and see if we need
> to support the second option in the future.
>
> Thank you,
>
> Vlad
>
>
> On 11/3/15 10:49, Thomas Weise wrote:
>
>> Second option can be done by shading the dependencies. More reliable and
>> simpler than class loader tricks.
>>
>> But why would you need the second option to start with? Why would an
>> application define conflicting dependencies? I don't see that happen often
>> when dependencies follow semantic versioning.
>>
>>
>> On Tue, Nov 3, 2015 at 10:26 AM, Vlad Rozov <v.rozov@datatorrent.com>
>> wrote:
>>
>> I think that just build tool will not provide necessary functionality for
>>> the second option. Module will have to implement it's own class loader
>>> otherwise behavior will be undefined in case where application embeds
>>> module and independently different version of an operator included into
>>> the
>>> module.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 11/3/15 10:10, Thomas Weise wrote:
>>>
>>> Both are valid options and to be solved through the build tool. In one
>>>> case
>>>> the dependency is declared, in the latter embedded.
>>>>
>>>> Thomas
>>>>
>>>> On Tue, Nov 3, 2015 at 10:07 AM, Vlad Rozov <v.rozov@datatorrent.com>
>>>> wrote:
>>>>
>>>> This brings one more issue that we did not cover in the module design
>>>>
>>>>> discussions.  What happens when new versions of operators embedded
>>>>> into a
>>>>> module become available? I believe we all go with the assumption that
>>>>> module will pickup version available on the classpath at run-time and
>>>>> it
>>>>> is
>>>>> operator developer responsibility to provide full binary compatibility.
>>>>> Another possible behavior is to consider module as completely
>>>>> independent
>>>>> unit and package all necessary libraries along with the module.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>> On 11/3/15 09:40, Pramod Immaneni wrote:
>>>>>
>>>>> I am not suggesting we leak the internals or compromise access
>>>>> modifiers.
>>>>>
>>>>>> I
>>>>>> want the module developer to have the ability (not mandatory) to
make
>>>>>> available all or a subset of the properties of an operator easily
if
>>>>>> they
>>>>>> desire without having to create setter/getter for each of them. You
>>>>>> don't
>>>>>> have to expose the operator they belong to. My preference would also
>>>>>> be
>>>>>> to
>>>>>> preserve the namespace of the properties in some way for example
by
>>>>>> grouping them by operator name. Think about scenario where people
have
>>>>>> built modules using kafka input operator and there is a new kafka
>>>>>> connection property. Without having this ability the modules have
to
>>>>>> be
>>>>>> changed to support this property. With this feature the module
>>>>>> developers
>>>>>> have a choice whether to keep the list of kafka properties fixed
in
>>>>>> the
>>>>>> module or allow new properties.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <thomas@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>> There is also the option to inherit common module properties through
a
>>>>>>
>>>>>> base
>>>>>>> class.
>>>>>>>
>>>>>>> I don't see how this is any different from an operator. The developer
>>>>>>> decides what gets exposed and has the same options to control
it.
>>>>>>>
>>>>>>> Encapsulation is good practice, by leaking the module internals
the
>>>>>>> using
>>>>>>> code becomes brittle.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <amol@datatorrent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> The same goes for a Java or C++ class that changes its api. In
>>>>>>> general
>>>>>>> this
>>>>>>>
>>>>>>> is left to the developer, and these languages have internals
as
>>>>>>> private
>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>> default for precisely the same purpose. The module developer
must
>>>>>>> have
>>>>>>>
>>>>>>>> the
>>>>>>>>
>>>>>>> right to change internals, keep api clean/constant and expect
user
>>>>>>> code
>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>> not break.
>>>>>>>
>>>>>>>> Thks,
>>>>>>>> Amol
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <
>>>>>>>> pramod@datatorrent.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> For 3 and 4 can't we strike a balance between not having
to expose
>>>>>>>> the
>>>>>>>>
>>>>>>>> operators underneath and at the same time not having to write
>>>>>>>>>
>>>>>>>>> boilerplate
>>>>>>>>>
>>>>>>>> code for all the properties that the module wants to make
available
>>>>>>>>
>>>>>>>> outside. It can quickly become unmanageable. For example,
an input
>>>>>>>>>
>>>>>>>>> operator
>>>>>>>>>
>>>>>>>> has a new connection property which can be used outside and
now all
>>>>>>>>
>>>>>>>>> the
>>>>>>>>> modules using that operator, their code has to be modified
to just
>>>>>>>>> add
>>>>>>>>>
>>>>>>>>> a
>>>>>>>>>
>>>>>>>> pass through setter/getter. How about treating the operator
name as
>>>>>>>> a
>>>>>>>> group
>>>>>>>>
>>>>>>>> name and ability for module developers to easily make
>>>>>>>>
>>>>>>>>> available/specify
>>>>>>>>>
>>>>>>>>> all
>>>>>>>>>
>>>>>>>> or a subset of the properties of an operator to the user
without
>>>>>>>>
>>>>>>>>> having
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> explicitly make each of them a module property.
>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <amol@datatorrent.com>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> 3,4 should follow conventions where the creator decides the
api
>>>>>>>>
>>>>>>>> (including
>>>>>>>>>
>>>>>>>>> accessibility). In general only those properties exposed
by module
>>>>>>>>>
>>>>>>>>>> creator
>>>>>>>>>>
>>>>>>>>> should be settable. What the module internally does with
them is
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>
>>>>>>>>> designer's call. Accessing internals of module from outside
is
>>>>>>>>
>>>>>>>> uncommon.
>>>>>>>>> For exampe in Java (or C++) private fields/members are
not to be
>>>>>>>>> accessed.
>>>>>>>>>
>>>>>>>>> Properties (setter and getter) are the api that module
designer
>>>>>>>>> gives
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> the module user. It is dangerous and has unintended consequences
if
>>>>>>>>> module
>>>>>>>>>
>>>>>>>>> user starts to access internals outside the api.
>>>>>>>>>
>>>>>>>>>> Partitioning should be next phase. As long as current
design does
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> halt
>>>>>>>>>>
>>>>>>>>> partitioning it should be ok (which I believe is true).
>>>>>>>>>
>>>>>>>>>> Thks,
>>>>>>>>>> Amol
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <
>>>>>>>>>> v.rozov@datatorrent.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 1. +1, though passing original DAG to module's populateDAG()
it is
>>>>>>>>>> not
>>>>>>>>>>
>>>>>>>>>> by
>>>>>>>>>
>>>>>>>>> design and is the current pull request implementation
details.
>>>>>>>>>
>>>>>>>>>> 2. While I agree that both Module and StreamingApplication
let's
>>>>>>>>>>> module/application designer to expose DAG design
reuse pattern
>>>>>>>>>>> and
>>>>>>>>>>> StreamingApplication interface may be extending
Module, it does
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> seem
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>> buy us much. Do we want to allow certain applications
to be reused
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>
>>>>>>>>>> Modules in other applications or should application
package be
>>>>>>>>> different
>>>>>>>>>
>>>>>>>>>> from Module package? The current approach is to distribute
Modules
>>>>>>>>>> as
>>>>>>>>>>
>>>>>>>>>> part
>>>>>>>>> of .jar for example as part of Malhar library without
necessarily
>>>>>>>>>
>>>>>>>>>> providing
>>>>>>>>>>>
>>>>>>>>>> all necessary dependencies. Application package on
other side must
>>>>>>>>>>
>>>>>>>>>>> include
>>>>>>>>>>>
>>>>>>>>>> all dependencies not provided by the platform.
>>>>>>>>>>
>>>>>>>>>>> 3, 4. While this will help Module designer, it
may complicate
>>>>>>>>>>>
>>>>>>>>>>> Module
>>>>>>>>>>>
>>>>>>>>>> maintenance and how Modules are used. What if Module
designer
>>>>>>>>> wants
>>>>>>>>> to
>>>>>>>>> change it's implementation and replace one operator implementation
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> another operator? Does StreamingApplication designer
need to know
>>>>>>>>>
>>>>>>>>> internal
>>>>>>>>>>
>>>>>>>>>> structure of Modules? Should Module be considered
as a black box
>>>>>>>>>>
>>>>>>>>>>> during
>>>>>>>>>>>
>>>>>>>>>> Application design time as it was initially planned?
>>>>>>>>>
>>>>>>>>> 5, 6, 7 +1. This is currently proposed behavior of Module
>>>>>>>>>>
>>>>>>>>>>> functionality
>>>>>>>>>>>
>>>>>>>>>> the way I understand it.
>>>>>>>>>
>>>>>>>>> 8. We need to see what Module designer can specify for
>>>>>>>>>>
>>>>>>>>>>> partitioning.
>>>>>>>>>>>
>>>>>>>>>> One
>>>>>>>>> of supported cases should be ability to specify cascading
>>>>>>>>>
>>>>>>>>>> partitioning
>>>>>>>>>>
>>>>>>>>>> scheme.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>>> Vlad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/2/15 10:30, Pramod Immaneni wrote:
>>>>>>>>>>>
>>>>>>>>>>> I have some comments and suggestions on the module
design. I
>>>>>>>>>>> think
>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>>> need to be taken into account before we can merge
the
>>>>>>>>>>
>>>>>>>>>> implementation
>>>>>>>>>>>
>>>>>>>>>>> provided below into the mainline code. I apologize
if these
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>> have
>>>>>>>>>
>>>>>>>>>> been brought up earlier as for some reason or the
other I was out
>>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>> loop
>>>>>>>>>>
>>>>>>>>> on this one
>>>>>>>>>
>>>>>>>>>>         https://github.com/apache/incubator-apex-core/pull/148
>>>>>>>>>>>> <
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963
>>>>>>>
>>>>>>>         1. DAG scoping currently in the implementation is global
for
>>>>>>>
>>>>>>>> modules,
>>>>>>>>>
>>>>>>>>>> each module's populateDAG sees the entire DAG. It
should be
>>>>>>>>>>> locally
>>>>>>>>>>>
>>>>>>>>>>> scoped
>>>>>>>>>>
>>>>>>>>> as one module does not and should not know about another.
>>>>>>>>>
>>>>>>>>>>         2. The module has a populateDAG method with
exact same
>>>>>>>>>>>> syntax
>>>>>>>>>>>>
>>>>>>>>>>>> as
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> StreamingApplication. Is StreamingApplication also a
module,
>>>>>>>>>
>>>>>>>>>> should
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> extend that interface.
>>>>>>>>
>>>>>>>>>         3. Setting properties for modules is too verbose.
Module
>>>>>>>>>>
>>>>>>>>>>> developer
>>>>>>>>>>>>
>>>>>>>>>>> needs to repeat every property they want exposed
with a setter
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> getter
>>>>>>>>>>> in JAVA. I don't disagree that module developer
should be able to
>>>>>>>>>>> choose
>>>>>>>>>>>
>>>>>>>>>>> which properties from which operators need to
be exposed but the
>>>>>>>>>>
>>>>>>>>>> current
>>>>>>>>>>>
>>>>>>>>>>> way seems to duplicate code. Here is a suggestion.
>>>>>>>>>>
>>>>>>>>>>              a. Allow modules to specify which operators
and
>>>>>>>>>>>
>>>>>>>>>>>> properties
>>>>>>>>>>>>
>>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible from outside. One way is in the "populateDAG"
method of
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>> module when adding the operator have the ability
to specify if
>>>>>>>>>> this
>>>>>>>>>> operator can be accessible from outside and which
or all
>>>>>>>>>>
>>>>>>>>> properties
>>>>>>>>>
>>>>>>>>>> can
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> accessible.
>>>>>>>>>>
>>>>>>>>>>>              b. Provide methods in ModuleMeta
or elsewhere to set
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> values by specifying the operator name (friendly
name) inside the
>>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>
>>>>>>>>>>> and property name. If this is allowed by a. above
it is
>>>>>>>>>> successful
>>>>>>>>>>
>>>>>>>>>> else
>>>>>>>>>>>
>>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>> should fail.
>>>>>>>>>>
>>>>>>>>>>>              c. Allow a syntax in property files
to specify the
>>>>>>>>>>>>
>>>>>>>>>>>> property
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>> b.
>>>>>>>>>
>>>>>>>>>> Example syntax
>>>>>>>>>>>
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.prop.<
>>>>>>>>>>>>
>>>>>>>>>>> propname>
>>>>>>>>>>
>>>>>>>>>         4. For attributes same mechanism as in 3 should
apply for
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> operators
>>>>>>>>>>>> that are exposed by the module.  For property
file, example
>>>>>>>>>>>> syntax
>>>>>>>>>>>> dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
>>>>>>>>>>>>         5. Module developers in addition
to 3. and 4. above may
>>>>>>>>>>>>
>>>>>>>>>>>> choose
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> support module level properties and attributes. These
should not
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> default when 3. and 4. are possible but complementary,
in addition
>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> them.
>>>>>>>>>> In this case for properties they can implement setters
and getters
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>> module. For attributes the user should still be able
to set the
>>>>>>>>>>
>>>>>>>>>>> attributes
>>>>>>>>>>> using the dag setAttribute method. You could
introduce a method
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> module to process attributes that can get called
by the engine
>>>>>>>>>> once
>>>>>>>>>> everything is set.
>>>>>>>>>>
>>>>>>>>>         6. For 5. above setting global properties and
attributes
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>> module
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> akin to ideas that have been proposed for the application
as
>>>>>>>>>>>
>>>>>>>>>>>> well. A
>>>>>>>>>>>>
>>>>>>>>>>> consistent way must be possible for applications
as well even if
>>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>> not
>>>>>>>>
>>>>>>>>> implemented now.
>>>>>>>>>>
>>>>>>>>>>>         7. For 5. or 6. above there should be
a property file way
>>>>>>>>>>>> of
>>>>>>>>>>>> specifying
>>>>>>>>>>>> the global module properties and attributes.
Example syntax
>>>>>>>>>>>> dt.module.<modulename>.prop.<propname>,
>>>>>>>>>>>> dt.module.<modulename>.attr.<attrname>.
>>>>>>>>>>>> Notice the difference with 3. c. and 4 above
that there is no
>>>>>>>>>>>>
>>>>>>>>>>>> operator
>>>>>>>>>>>>
>>>>>>>>>>> keyword here.
>>>>>>>>>>         8. Partitioning needs to be consistent with
what the user
>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>>
>>>>>>>>>>> expect
>>>>>>>>>>
>>>>>>>>> when they see module as an entity. I will send an image
of
>>>>>>>>>
>>>>>>>>>> possible
>>>>>>>>>>>
>>>>>>>>>>> examples of how the user will expect the physical
plan to look in
>>>>>>>>>>
>>>>>>>>> certain
>>>>>>>>>
>>>>>>>>>> cases.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message