flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Can i use lot of keyd states or should i use 1 big key state.
Date Wed, 09 Aug 2017 14:07:04 GMT
Hi,

If you have one keyed state, say "count by email id", and many different keys you will only
have one column in RocksDB (or one HashTable). Actually, a lot of users have hundreds of millions
of different keys for some states.

Best,
Aljoscha 
> On 2. Aug 2017, at 14:59, shashank agarwal <shashank734@gmail.com> wrote:
> 
> If I am creating KeyedState ("count by email id") and keyed stream has 10 unique email
id's.
> 
> Will it create 1 column family or hash table ?
> 
> Or it will create 10 column family or hash table ?
> 
> Can i have millions of unique email id in that keyed state ?
> 
> 
> 
> On Tue, Aug 1, 2017 at 2:59 AM, shashank agarwal <shashank734@gmail.com <mailto:shashank734@gmail.com>>
wrote:
> Ok if i am taking it as right for an example :
> 
> if  i am creating a keyed state with name "total count by email" for key(project id +
email)  than it will create a single hash-table or column family "total count by email" and
all the unique email id's will be rows of that single hash-table or column family and than
i can store millions of unique email id's in that.
> 
> Means it will create only single state object for all unique email id's ?
> 
> 
> 
> 
> On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <sewen@apache.org <mailto:sewen@apache.org>>
wrote:
> Each keyed state in Flink is a hashtable or a column family in RocksDB. Having too many
of those is not memory efficient.
> 
> Having fewer states is better, if you can adapt your schema that way.
> 
> I would also look into "MapState", which is an efficient way to have "sub keys" under
a keyed state.
> 
> Stephan
> 
> 
> On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank734@gmail.com <mailto:shashank734@gmail.com>>
wrote:
> Hello,
> 
> I have to compute results on basis of lot of history data, parameters like total transactions
in last 1 month, last 1 day, last 1 hour etc. by email id, ip, mobile, name, address, zipcode
etc.
> 
> So my question is this right approach to create keyed state by email, mobile, zipcode
etc. or should i create 1 big mapped state (BS) and than process that BS, may be in process
function or by applying some loop and filter logic in window or process function. 
> 
> My main worry is i will end up with millions of states, because there can be millions
unique emails, phone numbers or zipcode if i create keyed state by email, phone etc.
> 
> am i right ? is this impact on the performance or is this wrong approach ? Which approach
would you suggest in this use case.
> 
> 
> -- 
> Thanks Regards
> 
> SHASHANK AGARWAL
>  ---  Trying to mobilize the things....
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Thanks Regards
> 
> SHASHANK AGARWAL
>  ---  Trying to mobilize the things....
> 
> 
> 
> 
> -- 
> Thanks Regards
> 
> SHASHANK AGARWAL
>  ---  Trying to mobilize the things....


Mime
View raw message