apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: File control tuples
Date Sun, 04 Jun 2017 03:50:40 GMT
This is not specific to the batch work. This is a more generic
functionality which even streaming applications can benefit from.

The separate port is for both the actual tuple as well as the metadata.

~ Bhupesh


_______________________________________________________

Bhupesh Chawda

E: bhupesh@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Sat, Jun 3, 2017 at 9:37 AM, Thomas Weise <thw@apache.org> wrote:

> How does this relate to the batch control tuples work?
>
> With a separate port, how can a downstream operator relate the metadata to
> the tuples emitted from the primary port?
>
> --
> sent from mobile
> On Jun 2, 2017 12:06 PM, "Bhupesh Chawda" <bhupesh@datatorrent.com> wrote:
>
> ​Hi,
>
> ​
>
> Emitting ​file
> ​information
> for a file based source like a file input operator
> ​in malhar ​
> seems
> ​like
> a
> ​good
> feature to provide. It is useful information for any downstream operator to
> ​know
> that a data tuple belongs to a certain file
> ​ for instance​
> .
>
>
> We propose to add capability in the abstract file input operator to emit
> file control tuples. These control tuples can include filenames as well as
> any metadata that the user wishes to include along with it.
>
> ​To link this meta data to each tuple, we can add another port to the input
> operator which would carry the meta data along with the actual tuple. We
> can try to reduce the amount of meta data that goes with each tuple by
> having some sort of meta encoding in the control tuple.
>
>
> ~ Bhupesh​
>
>
> _______________________________________________________
>
> Bhupesh Chawda
>
> E: bhupesh@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message