flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: state size in relation to cluster size and processing speed
Date Tue, 21 Feb 2017 12:30:15 GMT
Hi Seth,
sorry for taking so long to get back to you on this. I think the watermark
thing might have been misleading by me, I don't even know anymore what I
was thinking back then.

Were you able to confirm that the results were in fact correct for the runs
with the different parallelism? I know the results are not the same because
you process different amounts of data, but still the correctness of the
result can be confirmed.


On Fri, 16 Dec 2016 at 21:01 Seth Wiesman <swiesman@mediamath.com> wrote:


I’ve noticed something peculiar about the relationship between state size
and cluster size and was wondering if anyone here knows of the reason. I am
running a job with 1 hour tumbling event time windows which have an allowed
lateness of 7 days. When I run on a 20-node cluster with FsState I can
process approximately 1.5 days’ worth of data in an hour with the most
recent checkpoint being ~20gb.  Now if I run the same job with the same
configurations on a 40-node cluster I can process 2 days’ worth of data in
20 min (expected) but the state size is only ~8gb. Because allowed lateness
is 7 days no windows should be purged yet and I would expect the larger
cluster which has processed more data to have a larger state. Is there some
why a slower running job or a smaller cluster would require more state?

This is more of a curiosity than an issue. Thanks’ in advance for any
insights you may have.

Seth Wiesman

View raw message