flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Knauf <konstantin.kn...@tngtech.com>
Subject Re: Large Numbers of Dynamically Created Jobs
Date Tue, 22 Mar 2016 21:17:46 GMT
Hi David,

interesting use case, I think, this can be nicely done with a comap. Let
me know if you run into problems, unfortunately I am not aware of any
open source examples.

Cheers,

Konstnatin

On 22.03.2016 21:07, David Brelloch wrote:
> Konstantin,
> 
> For now the jobs will largely just involve incrementing or decrementing
> based on the json message coming in. We will probably look at adding
> windowing later but for now that isn't a huge priority.
> 
> As an example of what we are looking to do lets say the following
> 3 message were read from the kafka topic:
> {"customerid": 1, "event": "addAdmin"}
> {"customerid": 1, "event": "addAdmin"}
> {"customerid": 1, "event": "deleteAdmin"}
> 
> If the customer with id of 1 had said they care about that type of
> message we would expect to be tracking the number of admins and notify
> them that they currently have 2. The events are obviously much more
> complicated than that and they are not uniform but that is the general
> overview.
> 
> I will take a look at using the comap operator. Do you know of any
> examples where it is doing something similar? Quickly looking I am not
> seeing it used anywhere outside of tests where it is largely just
> unifying the data coming in.
> 
> I think accumulators will at least be a reasonable starting place for us
> so thank your for pointing me in that direction.
> 
> Thanks for your help!
> 
> David
> 
> On Tue, Mar 22, 2016 at 3:27 PM, Konstantin Knauf
> <konstantin.knauf@tngtech.com <mailto:konstantin.knauf@tngtech.com>> wrote:
> 
>     Hi David,
> 
>     I have no idea how many parallel jobs are possible in Flink, but
>     generally speaking I do not think this approach will scale, because you
>     will always only have one job manager for coordination. But there is
>     definitely someone on the list, who can tell you more about this.
> 
>     Regarding your 2nd question. Could you go into some more details, what
>     the jobs will do? Without knowing any details, I think a control kafka
>     topic which contains the "job creation/cancellation requests" of the
>     users in combination with a comap-operator is the better solution here.
>     You could keep the currently active "jobs" as state in the comap and and
>     emit one record of the original stream per active user-job together with
>     some indicator on how to process it based on the request. What are your
>     concerns with respect to insight in the process? I think with some nice
>     accumulators you could get a good idea of what is going on, on the other
>     hand if I think about monitoring 1000s of jobs I am actually not so
>     sure ;)
> 
>     Cheers,
> 
>     Konstantin
> 
>     On 22.03.2016 19 <tel:22.03.2016%2019>:16, David Brelloch wrote:
>     > Hi all,
>     >
>     > We are currently evaluating flink for processing kafka messages and are
>     > running into some issues. The basic problem we are trying to solve is
>     > allowing our end users to dynamically create jobs to alert based off the
>     > messages coming from kafka. At launch we figure we need to support at
>     > least 15,000 jobs (3000 customers with 5 jobs each). I have the example
>     > kafka job running and it is working great. The questions I have are:
>     >
>     >  1. On my local machine (admittedly much less powerful than we
>     would be
>     >     using in production) things fall apart once I get to around 75 jobs.
>     >     Can flink handle a situation like this where we are looking at
>     >     thousands of jobs?
>     >  2. Is this approach even the right way to go? Is there a different
>     >     approach that would make more sense? Everything will be listening to
>     >     the same kafka topic so the other thought we had was to have 1 job
>     >     that processed everything and was configured by a separate control
>     >     kafka topic. The concern we had there was we would almost completely
>     >     lose insight into what was going on if there was a slow down.
>     >  3. The current approach we are using for creating dynamic jobs is
>     >     building a common jar and then starting it with the configuration
>     >     data for the individual job. Does this sound reasonable?
>     >
>     >
>     > If any of these questions are answered elsewhere I apologize. I
>     couldn't
>     > find any of this being discussed elsewhere.
>     >
>     > Thanks for your help.
>     >
>     > David
> 
>     --
>     Konstantin Knauf * konstantin.knauf@tngtech.com
>     <mailto:konstantin.knauf@tngtech.com> * +49-174-3413182
>     <tel:%2B49-174-3413182>
>     TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>     Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>     Sitz: Unterföhring * Amtsgericht München * HRB 135082
> 
> 

-- 
Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Mime
View raw message