apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinmay Kolhatkar <chin...@apache.org>
Subject Re: Adding Enrichment operator to malhar
Date Tue, 15 Mar 2016 13:59:39 GMT
That's a great suggestion.
But before we do we need to figure out how can we hold a flowing stream
while a runtime configuration changes are happening.
I'm unaware of such feature in the platform.

Though, I suggest we should get above basic functionality into malhar in
the first phase of Enrichment operator, we can take care of this later?

Thanks,
Chinmay.


On Tue, Mar 15, 2016 at 7:22 PM, Mohit Jotwani <mohit@datatorrent.com>
wrote:

> Actually, I meant dynamically - will that be allowed?
>
> Regards,
> Mohit
>
> On Tue, Mar 15, 2016 at 7:18 PM, Chinmay Kolhatkar <chinmay@apache.org>
> wrote:
>
> > Yes. One could implement DBLoader interface and provide that as a plugin
> to
> > Enrichment Operator during compile time.
> >
> >
> > On Tue, Mar 15, 2016 at 7:17 PM, Mohit Jotwani <mohit@datatorrent.com>
> > wrote:
> >
> > > This is one of the most important features required within a pipeline.
> > >
> > > Will it allow other store plugins (for reference lookup) to be added to
> > the
> > > operator?
> > >
> > > +1
> > >
> > > Regards,
> > > Mohit
> > >
> > > On Tue, Mar 15, 2016 at 6:57 PM, Chinmay Kolhatkar <chinmay@apache.org
> >
> > > wrote:
> > >
> > > > Hello Community,
> > > >
> > > > We want to add Enrichment operator to malhar library.
> > > >
> > > > Here are some initial details about it:
> > > >
> > > > UseCase:
> > > > =================
> > > > Data enrichment is an extremely common and important step in almost
> ALL
> > > > batch and stream processing flows.
> > > > Streaming use cases deal with log data which often lacks context and
> > > > metadata. The metadata is required for all additional analytical
> > > processing
> > > > This operator allows one to enrich stream data with data from
> external
> > > > source.
> > > >
> > > > Functionality:
> > > > =================
> > > > 1. Take input as POJO and emit enriched POJO as per the
> configuration.
> > > > 2. The external store can be configurable and will be a plugin model.
> > > > 3. Currently support for JDBC, Hbase and File based format store will
> > be
> > > > added.
> > > > 4. Operator will perform a reference lookup to these external
> databases
> > > to
> > > > enrich the incoming tuple.
> > > >
> > > > Design:
> > > > =================
> > > > 1. As mentioned above stores (viz. Database Loaders) will be plugin
> > based
> > > > machanism.
> > > > 2. To make the loaders pluggable they'll follow a common interface as
> > > > follows:
> > > >      public interface DBLoader extends
> > > > com.datatorrent.lib.db.cache.CacheManager.Backup
> > > >      {
> > > >          public void setFields(List<String> lookupFields,List<String>
> > > > includeFields);
> > > >          public void setFieldInfo(List<FieldInfo> fieldInfos)
> > > >      }
> > > > 3. All the above mentioned loaders (JDBC, Hbase, file etc) will
> > implement
> > > > above interface and Enrichment Operator will use object of this
> > interface
> > > > to query missing fields to be enriched.
> > > > 4. Both input and output ports of Enrichment Operator will need to be
> > set
> > > > with TUPLE_CLASS Attribute for the operator to know of upstream and
> > > > downstream.
> > > > This means, input schema can be seperate from output schema.
> > > > 5. Enrichment operator will use PojoUtils to create getters and
> setters
> > > > which will be used to dynamically generate the new enriched object on
> > the
> > > > fly.
> > > > 6. User need to configure enrichmentMap for any change in
> > > > columnName/inputFieldName to outputField.
> > > >
> > > > Please provide your valuable feedback on above.
> > > >
> > > > Thanks in advance,
> > > > Chinmay.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message