flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: How to implement Multi-tenancy in Flink
Date Wed, 04 Jul 2018 11:00:22 GMT
Would it be feasible for you to partition your tenants across jobs, like 
for example 100 customers per job?

On 04.07.2018 12:53, Ahmad Hassan wrote:
> Hi Fabian,
>
> One job per tenant model soon becomes hard to maintain. For example 
> 1000 tenants would require 1000 Flink and providing HA and resilience 
> for 1000 jobs is not so trivial solution.
>
> This is why we are hoping to get single flink job handling all the 
> tenants through keyby tenant. However this also does not scale with 
> growing number of tenants and putting all load on single Flink job.
>
> So I was wondering how other users are handling multitenancy in flink 
> at scale.
>
> Best Regards,
>
> On Wed, 4 Jul 2018 at 11:40, Fabian Hueske <fhueske@gmail.com 
> <mailto:fhueske@gmail.com>> wrote:
>
>     Hi Ahmad,
>
>     Some tricks that might help to bring down the effort per tenant if
>     you run one job per tenant (or key per tenant):
>
>     - Pre-aggregate records in a 5 minute Tumbling window. However,
>     pre-aggregation does not work for FoldFunctions.
>     - Implement the window as a custom ProcessFunction that maintains
>     a state of 288 events and aggregates and retracts the
>     pre-aggregated records.
>
>     Best, Fabian
>
>
>     2018-07-03 15:22 GMT+02:00 Ahmad Hassan <ahmad.hassan@gmail.com
>     <mailto:ahmad.hassan@gmail.com>>:
>
>         Hi Folks,
>
>         We are using Flink to capture various interactions of a
>         customer with ECommerce store i.e. product views, orders
>         created. We run 24 hour sliding window 5 minutes apart which
>         makes 288 parallel windows for a single Tenant. We implement
>         Fold Method that has various hashmaps to update the statistics
>         of customers from the incoming Ecommerce event one by one. As
>         soon as the event arrives, the fold method updates the
>         statistics in hashmaps.
>
>         Considering 1000 Tenants, we have two solutions in mind:
>
>         !) Implement a flink job per tenant. So 1000 tenants would
>         create 1000 flink jobs
>
>         2) Implement a single flink with keyBy 'tenant' so that each
>         tenant gets a separate window. But this will end up in
>         creating 1000 * 288 number of windows in 24 hour period. This
>         would cause extra load on single flink job.
>
>         What is recommended approach to handle multitenancy in flink
>         at such a big scale with over 1000 tenants while storing the
>         fold state for each event. Solution I would require
>         significant effort to keep track of 1000 flink jobs and
>         provide resilience.
>
>         Thanks.
>
>         Best Regards,
>
>


Mime
View raw message