flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Grier <ja...@data-artisans.com>
Subject Re: Large Numbers of Dynamically Created Jobs
Date Tue, 22 Mar 2016 22:22:16 GMT
Hi David,

Here's an example of something similar to what you're talking about:
https://github.com/jgrier/FilteringExample

Have a look at the TweetImpressionFilteringJob.

-Jamie


On Tue, Mar 22, 2016 at 2:24 PM, David Brelloch <brelloch@gmail.com> wrote:

> Konstantin,
>
> Not a problem. Thanks for pointing me in the right direction.
>
> David
>
> On Tue, Mar 22, 2016 at 5:17 PM, Konstantin Knauf <
> konstantin.knauf@tngtech.com> wrote:
>
>> Hi David,
>>
>> interesting use case, I think, this can be nicely done with a comap. Let
>> me know if you run into problems, unfortunately I am not aware of any
>> open source examples.
>>
>> Cheers,
>>
>> Konstnatin
>>
>> On 22.03.2016 21:07, David Brelloch wrote:
>> > Konstantin,
>> >
>> > For now the jobs will largely just involve incrementing or decrementing
>> > based on the json message coming in. We will probably look at adding
>> > windowing later but for now that isn't a huge priority.
>> >
>> > As an example of what we are looking to do lets say the following
>> > 3 message were read from the kafka topic:
>> > {"customerid": 1, "event": "addAdmin"}
>> > {"customerid": 1, "event": "addAdmin"}
>> > {"customerid": 1, "event": "deleteAdmin"}
>> >
>> > If the customer with id of 1 had said they care about that type of
>> > message we would expect to be tracking the number of admins and notify
>> > them that they currently have 2. The events are obviously much more
>> > complicated than that and they are not uniform but that is the general
>> > overview.
>> >
>> > I will take a look at using the comap operator. Do you know of any
>> > examples where it is doing something similar? Quickly looking I am not
>> > seeing it used anywhere outside of tests where it is largely just
>> > unifying the data coming in.
>> >
>> > I think accumulators will at least be a reasonable starting place for us
>> > so thank your for pointing me in that direction.
>> >
>> > Thanks for your help!
>> >
>> > David
>> >
>> > On Tue, Mar 22, 2016 at 3:27 PM, Konstantin Knauf
>> > <konstantin.knauf@tngtech.com <mailto:konstantin.knauf@tngtech.com>>
>> wrote:
>> >
>> >     Hi David,
>> >
>> >     I have no idea how many parallel jobs are possible in Flink, but
>> >     generally speaking I do not think this approach will scale, because
>> you
>> >     will always only have one job manager for coordination. But there is
>> >     definitely someone on the list, who can tell you more about this.
>> >
>> >     Regarding your 2nd question. Could you go into some more details,
>> what
>> >     the jobs will do? Without knowing any details, I think a control
>> kafka
>> >     topic which contains the "job creation/cancellation requests" of the
>> >     users in combination with a comap-operator is the better solution
>> here.
>> >     You could keep the currently active "jobs" as state in the comap
>> and and
>> >     emit one record of the original stream per active user-job together
>> with
>> >     some indicator on how to process it based on the request. What are
>> your
>> >     concerns with respect to insight in the process? I think with some
>> nice
>> >     accumulators you could get a good idea of what is going on, on the
>> other
>> >     hand if I think about monitoring 1000s of jobs I am actually not so
>> >     sure ;)
>> >
>> >     Cheers,
>> >
>> >     Konstantin
>> >
>> >     On 22.03.2016 19 <tel:22.03.2016%2019>:16, David Brelloch wrote:
>> >     > Hi all,
>> >     >
>> >     > We are currently evaluating flink for processing kafka messages
>> and are
>> >     > running into some issues. The basic problem we are trying to
>> solve is
>> >     > allowing our end users to dynamically create jobs to alert based
>> off the
>> >     > messages coming from kafka. At launch we figure we need to
>> support at
>> >     > least 15,000 jobs (3000 customers with 5 jobs each). I have the
>> example
>> >     > kafka job running and it is working great. The questions I have
>> are:
>> >     >
>> >     >  1. On my local machine (admittedly much less powerful than we
>> >     would be
>> >     >     using in production) things fall apart once I get to around
>> 75 jobs.
>> >     >     Can flink handle a situation like this where we are looking at
>> >     >     thousands of jobs?
>> >     >  2. Is this approach even the right way to go? Is there a
>> different
>> >     >     approach that would make more sense? Everything will be
>> listening to
>> >     >     the same kafka topic so the other thought we had was to have
>> 1 job
>> >     >     that processed everything and was configured by a separate
>> control
>> >     >     kafka topic. The concern we had there was we would almost
>> completely
>> >     >     lose insight into what was going on if there was a slow down.
>> >     >  3. The current approach we are using for creating dynamic jobs is
>> >     >     building a common jar and then starting it with the
>> configuration
>> >     >     data for the individual job. Does this sound reasonable?
>> >     >
>> >     >
>> >     > If any of these questions are answered elsewhere I apologize. I
>> >     couldn't
>> >     > find any of this being discussed elsewhere.
>> >     >
>> >     > Thanks for your help.
>> >     >
>> >     > David
>> >
>> >     --
>> >     Konstantin Knauf * konstantin.knauf@tngtech.com
>> >     <mailto:konstantin.knauf@tngtech.com> * +49-174-3413182
>> >     <tel:%2B49-174-3413182>
>> >     TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> >     Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> >     Sitz: Unterföhring * Amtsgericht München * HRB 135082
>> >
>> >
>>
>> --
>> Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>
>


-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
jamie@data-artisans.com

Mime
View raw message