apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: File control tuples
Date Sun, 04 Jun 2017 08:35:45 GMT
Yes, the idea is to associate every tuple to a reference to the meta data
sent in the control tuple.
That way, even with a partitioned input operator, the downstream can
distinguish between two tuples from different files.

~ Bhupesh

On Jun 3, 2017 21:30, "Thomas Weise" <thw@apache.org> wrote:

The extra port seems unnecessary unless you are planning to associate each
individual data tuple with a file reference (similar to WindowedTuple)?

On Sat, Jun 3, 2017 at 8:50 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
wrote:

> This is not specific to the batch work. This is a more generic
> functionality which even streaming applications can benefit from.
>
> The separate port is for both the actual tuple as well as the metadata.
>
> ~ Bhupesh
>
>
> _______________________________________________________
>
> Bhupesh Chawda
>
> E: bhupesh@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Sat, Jun 3, 2017 at 9:37 AM, Thomas Weise <thw@apache.org> wrote:
>
> > How does this relate to the batch control tuples work?
> >
> > With a separate port, how can a downstream operator relate the metadata
> to
> > the tuples emitted from the primary port?
> >
> > --
> > sent from mobile
> > On Jun 2, 2017 12:06 PM, "Bhupesh Chawda" <bhupesh@datatorrent.com>
> wrote:
> >
> > ​Hi,
> >
> > ​
> >
> > Emitting ​file
> > ​information
> > for a file based source like a file input operator
> > ​in malhar ​
> > seems
> > ​like
> > a
> > ​good
> > feature to provide. It is useful information for any downstream operator
> to
> > ​know
> > that a data tuple belongs to a certain file
> > ​ for instance​
> > .
> >
> >
> > We propose to add capability in the abstract file input operator to emit
> > file control tuples. These control tuples can include filenames as well
> as
> > any metadata that the user wishes to include along with it.
> >
> > ​To link this meta data to each tuple, we can add another port to the
> input
> > operator which would carry the meta data along with the actual tuple. We
> > can try to reduce the amount of meta data that goes with each tuple by
> > having some sort of meta encoding in the control tuple.
> >
> >
> > ~ Bhupesh​
> >
> >
> > _______________________________________________________
> >
> > Bhupesh Chawda
> >
> > E: bhupesh@datatorrent.com | Twitter: @bhupeshsc
> >
> > www.datatorrent.com  |  apex.apache.org
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message