flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaoxuan Wang <wshaox...@gmail.com>
Subject [Discuss] Retraction for Flink Streaming
Date Tue, 14 Mar 2017 15:53:14 GMT
Hello everyone,

Flink is widely used in Alibaba Group, especially in our Search and
Recommendation Infra. Retraction is one of the most important features that
we needed. We have spent lots of efforts to try to solve this problem, and
gladly at the end we develop an approach which can address most of
retraction problems in our production scenarios. Same as usual, we (Alibaba
search-data infra team) would like to share our retraction solution to the
entire Flink community. If you like this proposal, I would also like to
make it as one of the FLIPs. I am attaching the design doc of "Retraction
for Flink Streaming" as well as the introduction section below. I have also
created a master jira (FLINK-6047) to track the discussion and design of
the Flink retraction. All suggestions and comments are welcome.

*Design doc:*


"Retraction" is an important building block for data streaming to refine
the early fired results in streaming. “Early firing” are very common and
widely used in many streaming scenarios, for instance “window-less” or
unbounded aggregate and stream-stream inner join, windowed (with early
firing) aggregate and stream-stream inner join. As described in Streaming
102, there are mainly two cases that require retractions: 1) update on the
keyed table (the key is either a primaryKey (PK) on source table, or a
groupKey/partitionKey in an aggregate); 2) When dynamic windows (e.g.,
session window) are in use, the new value may be replacing more than one
previous window due to window merging.

To the best of our knowledge, the retraction for the early fired streaming
results has never been practically solved before. In this proposal, we
develop a retraction solution and explain how it works for the problem of
“update on the keyed table”. The same solution can be easily extended for
the dynamic windows merging, as the key component of retraction - how to
refine an early fired results - is the same across different problems.

*Master Jira: *


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message