apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: Modules support in Apex
Date Thu, 03 Sep 2015 15:33:15 GMT
Atri,
Great questions. Module flattening will mainly happen at launch time. Here
is the flow

Design time -> Launch time -> Run Time
// QA, tests, benchmarking, etc. is orthogonal to these, as each of them
will walk through the same flow

Design time -> design, try, iterate => Re-use of IP saves a lot of time
here. Operators help as leaf level IP, Modules help as higher level IP that
can be made by combining previously tested leaf level IP. Additionally they
should be (a) correct by construction due to leveraging other operators,
(b) allow properties, attributes to make re-use IP log more powerful.
Modules do not prohibit Apex engine from optimizing DAG during launch (or
even run) time, all the while enabling re-use at higher level. In fact they
may aid optimization by giving specific hints at a sub-DAG level. Same as
functions, templates, etc. in C++. This phase is mainly human time. This
results in a "logical plan" of App DAG

Launch time ->  Usually a one time cost. This is always completely
automated in any compiler (aka compile time). So stuffing more here moves
us from human time to computer time. Module moves a lot more work from
human time to computer time as compared to leaf level operator. This
results in physical plan of App DAG

Run time  -> Actual app run time. At this stage the flattening has already
happened. Apex apps are to run for-ever or for a long time. These are not 1
min apps. So an extra 10 seconds during launch time in a big data project
is amortized away over a days, months, or at least a few hours.

Additionally Module open up Apex engine to be used by any software that is
able to say "give me a spec, I will give back my sub-DAG". This will enable
others in the communicty to enrich/migrate app IP without Apex developers
being involved. I believe that is the ultimate gain, Module removes us
(Apex developers) from being a bottleneck in adding more IP into Apex apps.
Operators do that at leaf level, modules enable true distributed execution.
I am hoping to see this thesis proved by making Calcite integration easier.

Thks,
Amol

On Thu, Sep 3, 2015 at 1:12 AM, Atri Sharma <atri@apache.org> wrote:

> Amol.
>
> For my understanding, when you mention launch time/code generation time,
> are you referring to generation of physical plan, please?
>
> Regards,
>
> Atri
>
> On Thu, Sep 3, 2015 at 12:48 PM, Amol Kekre <amol@datatorrent.com> wrote:
>
> > Atri,
> > For a lot of operations module should be treated as a black box. It is
> just
> > another reusable IP. The flattening should happen at launch time.
> >
> > If we think of Apex as a compiler, then all the compile time checks
> (ports
> > connectivity, matching types/schema, properties, attributes, ...) are as
> > applicable to modules as to operators. At launch time (aka code
> generation
> > time) module gets flattened. Webservice should still enable access via
> > module scope on a running app.
> >
> > Thks,
> > Amol
> >
> >
> > On Thu, Sep 3, 2015 at 12:10 AM, Atri Sharma <atri@apache.org> wrote:
> >
> > > So the idea around our APEX-3 work will be that we will implement
> Module
> > > interface to build a class that adds operators at runtime? Sounds like
> a
> > > good idea, if Module is essentially a set of operators plugged in DAG.
> > >
> > > Is Module to be treated like a black box with input and output ports,
> and
> > > the internal subgraph either generated statically or dynamically?
> > >
> > > On Tue, Sep 1, 2015 at 2:47 AM, Vlad Rozov <v.rozov@datatorrent.com>
> > > wrote:
> > >
> > > > Atri,
> > > >
> > > > As a first cut module is a predefined subgraph that can be inserted
> > into
> > > a
> > > > DAG. Generally speaking module is not required to expand to the same
> > > > subgraph. Depending on module properties or configuration, module may
> > > > expand to different subgraphs. At the same time, similar to an
> operator
> > > > module has predefined input/output ports and properties.
> > > >
> > > > Thank you,
> > > >
> > > > Vlad
> > > >
> > > >
> > > > On 8/31/15 11:46, Atri Sharma wrote:
> > > >
> > > >> No, I dont think APEX-3's functionality needs to be exceeded for
> this.
> > > >> What
> > > >> I am trying to understand here is the concept of Module. Is it a
> > family
> > > of
> > > >> operators defined by same interface implementation *or* is it a
> > defined
> > > >> subgraph that can be replaced and used as a part of a subgraph
> instead
> > > of
> > > >> building the whole connection again?
> > > >>
> > > >> On Mon, Aug 31, 2015 at 11:27 PM, Tushar Gosavi <
> > tushar@datatorrent.com
> > > >
> > > >> wrote:
> > > >>
> > > >> Yes, you are correct. The APEX-3 is for dynamic expansion of the
> DAG,
> > If
> > > >>> we
> > > >>> can
> > > >>> expand the DAG dynamically then we can use that functionality
to
> > expand
> > > >>> the
> > > >>> DAG
> > > >>> with know list of operator and connections between them. In a
way
> > > APEX-3
> > > >>> provides
> > > >>> bigger functionality than APEX-55.
> > > >>>
> > > >>> Let me know if you think that module functionality requires more
> > > support
> > > >>> that what will be provided by APEX-3.
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Mon, Aug 31, 2015 at 10:39 AM, Atri Sharma <atri@apache.org>
> > wrote:
> > > >>>
> > > >>> Looks interesting.
> > > >>>>
> > > >>>> Can you explain a bit on how it helps APEX-3? The objective
of
> > APEX-3
> > > is
> > > >>>>
> > > >>> to
> > > >>>
> > > >>>> have dynamic expansion of sub DAGs. If I understand correctly,
the
> > > >>>> static
> > > >>>> list is to be known when declaring a module.
> > > >>>>
> > > >>>> Please correct me if I am wrong.
> > > >>>>
> > > >>>> On Mon, Aug 31, 2015 at 11:05 PM, Tushar Gosavi <
> > > tushar@datatorrent.com
> > > >>>> >
> > > >>>> wrote:
> > > >>>>
> > > >>>> Hi All,
> > > >>>>>
> > > >>>>> We are working on adding support for Modules in Apex.
A module
> is a
> > > >>>>>
> > > >>>> group
> > > >>>
> > > >>>> of operators that will have their own existence and will ease
the
> > way
> > > >>>>>
> > > >>>> we
> > > >>>
> > > >>>> are currently defining an application.
> > > >>>>>
> > > >>>>> A module is defined by:
> > > >>>>>
> > > >>>>>     - list of operators.
> > > >>>>>     - list of input and output ports.
> > > >>>>>     - set of properties for the module.
> > > >>>>>     - set of attributes for the module.
> > > >>>>>
> > > >>>>>
> > > >>>>> Details on the proposed work is given at:
> > > >>>>>
> > > >>>>> https://malhar.atlassian.net/browse/APEX-55 . The work
is also
> > > related
> > > >>>>>
> > > >>>> to
> > > >>>>
> > > >>>>> https://malhar.atlassian.net/browse/APEX-3
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> -Tushar.
> > > >>>>>
> > > >>>>>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message