flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li, Chengxiang" <chengxiang...@intel.com>
Subject A proposal about skew data handling in Flink
Date Thu, 15 Oct 2015 10:24:59 GMT
Dear all,
In many real world use case, data are nature to be skewed. For example, in social network,
famous people get much more "follow" than others, a hot tweet would be transferred millions
of times. and the purchased records of normal product can never compared to hot products.
While at the same time, Flink runtime assume that all tasks consume same size resources, this's
not always true. Skew data handling try to make skewed data fit into Flink's runtime.
I write a proposal about skew data handling in Flink, you can read it at https://docs.google.com/document/d/1ma060BUlhXDqeFmviEO7Io4CXLKgrAXIfeDYldvZsKI/edit?usp=sharing.
Any comments and feedback are welcome, you can comment on the google doc, or reply this email
thread directly.

Thanks
Chengxiang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message