flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shashank agarwal <shashank...@gmail.com>
Subject Re: Can i use lot of keyd states or should i use 1 big key state.
Date Wed, 02 Aug 2017 12:59:40 GMT
If I am creating KeyedState ("count by email id") and keyed stream has 10
unique email id's.

Will it create 1 column family or hash table ?

Or it will create 10 column family or hash table ?

Can i have millions of unique email id in that keyed state ?



On Tue, Aug 1, 2017 at 2:59 AM, shashank agarwal <shashank734@gmail.com>
wrote:

> Ok if i am taking it as right for an example :
>
> if  i am creating a keyed state with name "total count by email" for
> key(project id + email)  than it will create a single hash-table or column
> family "total count by email" and all the unique email id's will be rows of
> that single hash-table or column family and than i can store millions of
> unique email id's in that.
>
> Means it will create only single state object for all unique email id's ?
>
>
>
>
> On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Each keyed state in Flink is a hashtable or a column family in RocksDB.
>> Having too many of those is not memory efficient.
>>
>> Having fewer states is better, if you can adapt your schema that way.
>>
>> I would also look into "MapState", which is an efficient way to have "sub
>> keys" under a keyed state.
>>
>> Stephan
>>
>>
>> On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank734@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have to compute results on basis of lot of history data, parameters
>>> like total transactions in last 1 month, last 1 day, last 1 hour etc. by
>>> email id, ip, mobile, name, address, zipcode etc.
>>>
>>> So my question is this right approach to create keyed state by email,
>>> mobile, zipcode etc. or should i create 1 big mapped state (BS) and than
>>> process that BS, may be in process function or by applying some loop and
>>> filter logic in window or process function.
>>>
>>> My main worry is i will end up with millions of states, because there
>>> can be millions unique emails, phone numbers or zipcode if i create keyed
>>> state by email, phone etc.
>>>
>>> am i right ? is this impact on the performance or is this wrong approach
>>> ? Which approach would you suggest in this use case.
>>>
>>>
>>> --
>>> Thanks Regards
>>>
>>> SHASHANK AGARWAL
>>>  ---  Trying to mobilize the things....
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Thanks Regards
>
> SHASHANK AGARWAL
>  ---  Trying to mobilize the things....
>
>


-- 
Thanks Regards

SHASHANK AGARWAL
 ---  Trying to mobilize the things....

Mime
View raw message