flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmad Hassan <ahmad.has...@gmail.com>
Subject Re: How to implement Multi-tenancy in Flink
Date Thu, 05 Jul 2018 10:23:36 GMT
HI Chesnay,

Yes this is something we would eventually be doing and then maintaining the
configuration of which tenants are mapped to which flink jobs.

This would reduce the number of flinks jobs to maintain in order to support
1000s of tenants in our use case .

Thanks.

On Wed, 4 Jul 2018 at 12:00, Chesnay Schepler <chesnay@apache.org> wrote:

> Would it be feasible for you to partition your tenants across jobs, like
> for example 100 customers per job?
>
> On 04.07.2018 12:53, Ahmad Hassan wrote:
>
> Hi Fabian,
>
> One job per tenant model soon becomes hard to maintain. For example 1000
> tenants would require 1000 Flink and providing HA and resilience for 1000
> jobs is not so trivial solution.
>
> This is why we are hoping to get single flink job handling all the tenants
> through keyby tenant. However this also does not scale with growing number
> of tenants and putting all load on single Flink job.
>
> So I was wondering how other users are handling multitenancy in flink at
> scale.
>
> Best Regards,
>
> On Wed, 4 Jul 2018 at 11:40, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> Hi Ahmad,
>>
>> Some tricks that might help to bring down the effort per tenant if you
>> run one job per tenant (or key per tenant):
>>
>> - Pre-aggregate records in a 5 minute Tumbling window. However,
>> pre-aggregation does not work for FoldFunctions.
>> - Implement the window as a custom ProcessFunction that maintains a state
>> of 288 events and aggregates and retracts the pre-aggregated records.
>>
>> Best, Fabian
>>
>>
>> 2018-07-03 15:22 GMT+02:00 Ahmad Hassan <ahmad.hassan@gmail.com>:
>>
>>> Hi Folks,
>>>
>>> We are using Flink to capture various interactions of a customer with
>>> ECommerce store i.e. product views, orders created. We run 24 hour sliding
>>> window 5 minutes apart which makes 288 parallel windows for a single
>>> Tenant. We implement Fold Method that has various hashmaps to update the
>>> statistics of customers from the incoming Ecommerce event one by one. As
>>> soon as the event arrives, the fold method updates the statistics in
>>> hashmaps.
>>>
>>> Considering 1000 Tenants, we have two solutions in mind:
>>>
>>> !) Implement a flink job per tenant. So 1000 tenants would create 1000
>>> flink jobs
>>>
>>> 2) Implement a single flink with keyBy 'tenant' so that each tenant gets
>>> a separate window. But this will end up in creating 1000 * 288 number of
>>> windows in 24 hour period. This would cause extra load on single flink job.
>>>
>>> What is recommended approach to handle multitenancy in flink at such a
>>> big scale with over 1000 tenants while storing the fold state for each
>>> event. Solution I would require significant effort to keep track of 1000
>>> flink jobs and provide resilience.
>>>
>>> Thanks.
>>>
>>> Best Regards,
>>>
>>
>>
>

Mime
View raw message