apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Adding Enrichment operator to malhar
Date Tue, 15 Mar 2016 14:10:38 GMT
It is possible to accomplish this through an "advanced" property setter,
but given most user won't need this it's not important for the first
iteration.

--
sent from mobile
On Mar 15, 2016 6:59 AM, "Chinmay Kolhatkar" <chinmay@apache.org> wrote:

> That's a great suggestion.
> But before we do we need to figure out how can we hold a flowing stream
> while a runtime configuration changes are happening.
> I'm unaware of such feature in the platform.
>
> Though, I suggest we should get above basic functionality into malhar in
> the first phase of Enrichment operator, we can take care of this later?
>
> Thanks,
> Chinmay.
>
>
> On Tue, Mar 15, 2016 at 7:22 PM, Mohit Jotwani <mohit@datatorrent.com>
> wrote:
>
> > Actually, I meant dynamically - will that be allowed?
> >
> > Regards,
> > Mohit
> >
> > On Tue, Mar 15, 2016 at 7:18 PM, Chinmay Kolhatkar <chinmay@apache.org>
> > wrote:
> >
> > > Yes. One could implement DBLoader interface and provide that as a
> plugin
> > to
> > > Enrichment Operator during compile time.
> > >
> > >
> > > On Tue, Mar 15, 2016 at 7:17 PM, Mohit Jotwani <mohit@datatorrent.com>
> > > wrote:
> > >
> > > > This is one of the most important features required within a
> pipeline.
> > > >
> > > > Will it allow other store plugins (for reference lookup) to be added
> to
> > > the
> > > > operator?
> > > >
> > > > +1
> > > >
> > > > Regards,
> > > > Mohit
> > > >
> > > > On Tue, Mar 15, 2016 at 6:57 PM, Chinmay Kolhatkar <
> chinmay@apache.org
> > >
> > > > wrote:
> > > >
> > > > > Hello Community,
> > > > >
> > > > > We want to add Enrichment operator to malhar library.
> > > > >
> > > > > Here are some initial details about it:
> > > > >
> > > > > UseCase:
> > > > > =================
> > > > > Data enrichment is an extremely common and important step in almost
> > ALL
> > > > > batch and stream processing flows.
> > > > > Streaming use cases deal with log data which often lacks context
> and
> > > > > metadata. The metadata is required for all additional analytical
> > > > processing
> > > > > This operator allows one to enrich stream data with data from
> > external
> > > > > source.
> > > > >
> > > > > Functionality:
> > > > > =================
> > > > > 1. Take input as POJO and emit enriched POJO as per the
> > configuration.
> > > > > 2. The external store can be configurable and will be a plugin
> model.
> > > > > 3. Currently support for JDBC, Hbase and File based format store
> will
> > > be
> > > > > added.
> > > > > 4. Operator will perform a reference lookup to these external
> > databases
> > > > to
> > > > > enrich the incoming tuple.
> > > > >
> > > > > Design:
> > > > > =================
> > > > > 1. As mentioned above stores (viz. Database Loaders) will be plugin
> > > based
> > > > > machanism.
> > > > > 2. To make the loaders pluggable they'll follow a common interface
> as
> > > > > follows:
> > > > >      public interface DBLoader extends
> > > > > com.datatorrent.lib.db.cache.CacheManager.Backup
> > > > >      {
> > > > >          public void setFields(List<String>
> lookupFields,List<String>
> > > > > includeFields);
> > > > >          public void setFieldInfo(List<FieldInfo> fieldInfos)
> > > > >      }
> > > > > 3. All the above mentioned loaders (JDBC, Hbase, file etc) will
> > > implement
> > > > > above interface and Enrichment Operator will use object of this
> > > interface
> > > > > to query missing fields to be enriched.
> > > > > 4. Both input and output ports of Enrichment Operator will need to
> be
> > > set
> > > > > with TUPLE_CLASS Attribute for the operator to know of upstream and
> > > > > downstream.
> > > > > This means, input schema can be seperate from output schema.
> > > > > 5. Enrichment operator will use PojoUtils to create getters and
> > setters
> > > > > which will be used to dynamically generate the new enriched object
> on
> > > the
> > > > > fly.
> > > > > 6. User need to configure enrichmentMap for any change in
> > > > > columnName/inputFieldName to outputField.
> > > > >
> > > > > Please provide your valuable feedback on above.
> > > > >
> > > > > Thanks in advance,
> > > > > Chinmay.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message