flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Why would indefinitely growing state an issue for Flink while doing stream to stream joins?
Date Fri, 17 Jan 2020 09:10:42 GMT

Large state is mainly an issue for Flink's fault tolerance mechanism which
is based on periodic checkpoints, which means that the state is copied to a
remote storage system in regular intervals.
In case of a failure, the state copy needs to be loaded which takes more
time with growing state size.
There are a few features of Flink that reduce the cost of large state, like
incremental checkpoints and local recovery.
However, in general is large state more difficult to handle than small

If your application needs to persists state forever to run a join with
correct semantics, than this can be fine.
However, you should roughly assess how fast your state will be growing and
prepare your application to be able to scale to more machines (configure
max-parallelism) when the limits of your current setup are reached.

Best, Fabian

Am Do., 16. Jan. 2020 um 16:07 Uhr schrieb kant kodali <kanth909@gmail.com>:

> Hi All,
> The doc
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html#regular-joins
> the following.
> "However, this operation has an important implication: it requires to
> keep both sides of the join input in Flinkā€™s state forever. Thus, the
> resource usage will grow indefinitely as well, if one or both input tables
> are continuously growing"
> I wonder why this would be an issue especially when the state is stored in
> RocksDB which in turn is backed by disk?
> I have a use case where I might need to do stream-stream join or some
> emulation of that across say 6 or more tables and I don't know for sure how
> long I need to keep the state because a row today can join with a row a
> year or two years from now. will that be an issue? do I need to think about
> designing a solution in another way without using stream-stream join?
> Thanks!

View raw message