apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Schema Discovery Support in Apex Applications
Date Tue, 17 Jan 2017 06:41:37 GMT
+1 for the feature.

~ Bhupesh

On Mon, Jan 16, 2017 at 5:09 PM, Chinmay Kolhatkar <chinmay@apache.org>
wrote:

> Those are not really anonymous POJOs... The definition of POJO will be
> known to user as based on that only upstream operator will convey the tuple
> type the operator will be emitting.
> Using that information user can configure the operators. Those properties
> will be a bit different though.
>
> On Mon, Jan 16, 2017 at 4:20 PM, AJAY GUPTA <ajaygit158@gmail.com> wrote:
>
> > +1 for the idea.
> >
> > I just had one question.
> >
> > As I understand, there will be some form of Anonymous POJO used as
> objects
> > to pass information from one operator to another. Can you share how the
> > user/operator developer would access the tuple object in case he wishes
> to
> > do something with it?
> >
> >
> > Ajay
> >
> > On Mon, Jan 16, 2017 at 2:53 PM, Chinmay Kolhatkar <chinmay@apache.org>
> > wrote:
> >
> > > Hi All,
> > >
> > > Currently a DAG that is generated by user, if contains any POJOfied
> > > operators, TUPLE_CLASS attribute needs to be set on each and every port
> > > which receives or sends a POJO.
> > >
> > > For e.g., if a DAG is like File -> Parser -> Transform -> Dedup ->
> > > Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by user
> on
> > > both input and output ports of transform, dedup operators and also on
> > > parser output and formatter input.
> > >
> > > The proposal here is to reduce work that is required by user to
> configure
> > > the DAG. Technically speaking if an operators knows input schema and
> > > processing properties, it can determine output schema and convey it to
> > > downstream operators. This way the complete pipeline can be configured
> > > without user setting TUPLE_CLASS or even creating POJOs and adding them
> > to
> > > classpath.
> > >
> > > On the same idea, I want to propose an approach where the pipeline can
> be
> > > configured without user setting TUPLE_CLASS or even creating POJOs and
> > > adding them to classpath.
> > > Here is the document which at a high level explains the idea and a high
> > > level design:
> > > https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_
> > > tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing
> > >
> > > I would like to get opinion from community about feasibility and
> > > applications of this proposal.
> > > Once we get some consensus we can discuss the design in details.
> > >
> > > Thanks,
> > > Chinmay.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message