flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: expose side output stream
Date Thu, 11 Aug 2016 12:40:09 GMT
Hi!

This is a very big change, both on the semantics, the runtime classes.
These changes are tricky to get in, and usually work best if you document
the changes and all implications well.

Something like a deep design doc, or a FLIP would be great for this.
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Greetings,
Stephan


On Thu, Aug 11, 2016 at 12:41 AM, Chen Qin <qinnchen@gmail.com> wrote:

> Hi there,
>
> I am thinking of implement sideOutput into Flink which seems missing
> support.
> https://cloud.google.com/dataflow/model/par-do#side-outputs
>
> It is useful because it will help pipeline author redirect corrputed input/
> code bug to a side stream or write to a table and reconsile afterwards.
>
> After some hack prototyping, I were able to get it works for simple tests.
> Basically, It allows env to register a side output typeInfo which will be
> passed to configurations during graph building; Adding a new transform
> which similar to selection transform but holding different input type;
> StreamEdge will has a boolean to see if that is side output edge, if so,
> create output writer loads side output type serializer and emit record only
> when sideOutput is called.
>
> I have some problem passing side output type as template to each data
> stream. It means it will have to expose any output stream with two type
> parameters. As you can imagine, the API interface change will be sizable.
>
> Any suggestion?
>
> Chen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message