apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Including meta data with input tuples
Date Wed, 18 Nov 2015 05:39:29 GMT
Ok, so in the worst case, we'll have meta data followed by data for every
tuple.
However, in this case we need to include the meta data as part of the data
schema itself so as to allow the parser to process data and meta data in a
common way. This is similar to option 1 in the first email.


Thanks.
Bhupesh

On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <gaurav@datatorrent.com>
wrote:

> Bhupesh,
>
> No it doesn’t stall anything… Meta data and data tuples go on same port.
> Whenever there is a change in meta data, send the meta data first and then
> tuples following it. So the first tuple that arrives which has different
> meta data, will trigger sending of new meta data.
>
> Thanks
> - Gaurav
>
> > On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
> >
> > Depends on how "real time" the scenario is.
> > I think sending it only once during a window might work for some use
> cases.
> > If my understanding is correct, this essentially stalls the processing
> of a
> > window until the meta data is available which is not until end window of
> > the upstream operator.
> >
> > Thanks
> > -Bhupesh
> >
> >
> > On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta <gaurav@datatorrent.com>
> > wrote:
> >
> >> Bhupesh,
> >>
> >> If the requirement is to send meta data with every tuple then it should
> be
> >> send with data schema itself.
> >> Can sending meta data be optimized the way platform does with
> >> DefaultStatefulStreamCodec. I mean send the meta data only once in a
> window
> >> and all the tuples that are associated with this meta data have this
> meta
> >> data’s id.
> >>
> >> Thanks
> >> - Gaurav
> >>
> >>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
> >> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> In the design of input modules, we are facing situations where we might
> >>> need to pass on some meta data to the downstream modules, in addition
> to
> >>> actual data. Further, this meta data may need to be sent per record. An
> >>> example use case is to send a record and additionally send the file
> name
> >>> (as meta data) from which the record was read. Another example is
> sending
> >>> out the kafka topic information along with the message.
> >>>
> >>> We are exploring options on:
> >>>
> >>>  1. Whether to include the meta information in the data schema, so as
> to
> >>>  allow the parser to handle this data as regular data. This will
> involve
> >>>  changing the schema of the data.
> >>>  2. Whether to handle meta data separately and modify the behaviour of
> >>>  parser / converter to handle meta data separately as well.
> >>>  3. Use additional ports to transfer such meta data depending on
> >>>  different modules.
> >>>  4. Any other option
> >>>
> >>> Please comment.
> >>>
> >>> Consolidating comments on another thread here:
> >>>
> >>>  1. Have the tuple containing two parts, with the downstream parser
> >>>  ignoring the meta data
> >>>  1. Data
> >>>  2. Meta-data
> >>>  2. Use option 1, but concern regarding how unifiers will treat meta
> >>>  data, if they need to unify that as well.
> >>>  3. Another comment is to have a centralized meta data repo. This may
> be
> >>>  in memory as well, may be as a separate operator which stores and
> >> serves
> >>>  the meta data to other operators.
> >>>
> >>> Thanks.
> >>>
> >>> -Bhupesh
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message