flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Congxian Qiu <qcx978132...@gmail.com>
Subject Re: State size Vs keys number perfromance
Date Wed, 08 Apr 2020 02:13:05 GMT
Hi
I'll give some information from my side:
1. The performance for RocksDB is mainly related to the (de)serialization
and disk read/write.
2. MapState just need to (de)serialize the single mapkey/mapvalue when
read/write state, ValueState need to (de)serialize the whole state when
read/write the state
3. disk read/write is somewhat about the whole state size

Best,
Congxian


KristoffSC <krzysiek.chmielewski@gmail.com> 于2020年4月8日周三 上午2:41写道:

> Hi,
> I would to ask about what has more memory footprint and what could be more
> efficient regarding
> less keys with bigger keyState vs many keys with smaller keyState
>
> For this use case I'm using RocksDB StateBackend and state TTL is, well..
> infinitive. So I'm keeping the state forever in Flink.
>
> The use case:
> I have a stream of messages that I have to process it in some custom way.
> I can take one of two approaches
>
> 1. use a keyBy that will give me some number of distinct keys but for each
> key, the state size will be significant. It will be MapState in this case.
> The keyBy I used will still give me ability to spread operations across
> operator instances.
>
> 2. In second approach I can use a different keyBy, where I would have huge
> number of distinct keys, but each keyState will be very small and it will
> be
> a ValueState in this case.
>
> To sum up:
> "reasonable" number of keys with very big keySatte VS huge number of keys
> with very small state each.
>
> What are the pros and cons for both?
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Mime
View raw message