flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Knauf <konstantin.kn...@tngtech.com>
Subject Re: Large Numbers of Dynamically Created Jobs
Date Tue, 22 Mar 2016 19:27:17 GMT
Hi David,

I have no idea how many parallel jobs are possible in Flink, but
generally speaking I do not think this approach will scale, because you
will always only have one job manager for coordination. But there is
definitely someone on the list, who can tell you more about this.

Regarding your 2nd question. Could you go into some more details, what
the jobs will do? Without knowing any details, I think a control kafka
topic which contains the "job creation/cancellation requests" of the
users in combination with a comap-operator is the better solution here.
You could keep the currently active "jobs" as state in the comap and and
emit one record of the original stream per active user-job together with
some indicator on how to process it based on the request. What are your
concerns with respect to insight in the process? I think with some nice
accumulators you could get a good idea of what is going on, on the other
hand if I think about monitoring 1000s of jobs I am actually not so sure ;)



On 22.03.2016 19:16, David Brelloch wrote:
> Hi all,
> We are currently evaluating flink for processing kafka messages and are
> running into some issues. The basic problem we are trying to solve is
> allowing our end users to dynamically create jobs to alert based off the
> messages coming from kafka. At launch we figure we need to support at
> least 15,000 jobs (3000 customers with 5 jobs each). I have the example
> kafka job running and it is working great. The questions I have are:
>  1. On my local machine (admittedly much less powerful than we would be
>     using in production) things fall apart once I get to around 75 jobs.
>     Can flink handle a situation like this where we are looking at
>     thousands of jobs?
>  2. Is this approach even the right way to go? Is there a different
>     approach that would make more sense? Everything will be listening to
>     the same kafka topic so the other thought we had was to have 1 job
>     that processed everything and was configured by a separate control
>     kafka topic. The concern we had there was we would almost completely
>     lose insight into what was going on if there was a slow down.
>  3. The current approach we are using for creating dynamic jobs is
>     building a common jar and then starting it with the configuration
>     data for the individual job. Does this sound reasonable?
> If any of these questions are answered elsewhere I apologize. I couldn't
> find any of this being discussed elsewhere.
> Thanks for your help.
> David

Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

View raw message