flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Brelloch <brell...@gmail.com>
Subject Re: Large Numbers of Dynamically Created Jobs
Date Wed, 23 Mar 2016 00:38:55 GMT
Jamie,

That looks fantastic!

Thanks for the help.

David

On Tue, Mar 22, 2016 at 6:22 PM, Jamie Grier <jamie@data-artisans.com>
wrote:

> Hi David,
>
> Here's an example of something similar to what you're talking about:
> https://github.com/jgrier/FilteringExample
>
> Have a look at the TweetImpressionFilteringJob.
>
> -Jamie
>
>
> On Tue, Mar 22, 2016 at 2:24 PM, David Brelloch <brelloch@gmail.com>
> wrote:
>
>> Konstantin,
>>
>> Not a problem. Thanks for pointing me in the right direction.
>>
>> David
>>
>> On Tue, Mar 22, 2016 at 5:17 PM, Konstantin Knauf <
>> konstantin.knauf@tngtech.com> wrote:
>>
>>> Hi David,
>>>
>>> interesting use case, I think, this can be nicely done with a comap. Let
>>> me know if you run into problems, unfortunately I am not aware of any
>>> open source examples.
>>>
>>> Cheers,
>>>
>>> Konstnatin
>>>
>>> On 22.03.2016 21:07, David Brelloch wrote:
>>> > Konstantin,
>>> >
>>> > For now the jobs will largely just involve incrementing or decrementing
>>> > based on the json message coming in. We will probably look at adding
>>> > windowing later but for now that isn't a huge priority.
>>> >
>>> > As an example of what we are looking to do lets say the following
>>> > 3 message were read from the kafka topic:
>>> > {"customerid": 1, "event": "addAdmin"}
>>> > {"customerid": 1, "event": "addAdmin"}
>>> > {"customerid": 1, "event": "deleteAdmin"}
>>> >
>>> > If the customer with id of 1 had said they care about that type of
>>> > message we would expect to be tracking the number of admins and notify
>>> > them that they currently have 2. The events are obviously much more
>>> > complicated than that and they are not uniform but that is the general
>>> > overview.
>>> >
>>> > I will take a look at using the comap operator. Do you know of any
>>> > examples where it is doing something similar? Quickly looking I am not
>>> > seeing it used anywhere outside of tests where it is largely just
>>> > unifying the data coming in.
>>> >
>>> > I think accumulators will at least be a reasonable starting place for
>>> us
>>> > so thank your for pointing me in that direction.
>>> >
>>> > Thanks for your help!
>>> >
>>> > David
>>> >
>>> > On Tue, Mar 22, 2016 at 3:27 PM, Konstantin Knauf
>>> > <konstantin.knauf@tngtech.com <mailto:konstantin.knauf@tngtech.com>>
>>> wrote:
>>> >
>>> >     Hi David,
>>> >
>>> >     I have no idea how many parallel jobs are possible in Flink, but
>>> >     generally speaking I do not think this approach will scale,
>>> because you
>>> >     will always only have one job manager for coordination. But there
>>> is
>>> >     definitely someone on the list, who can tell you more about this.
>>> >
>>> >     Regarding your 2nd question. Could you go into some more details,
>>> what
>>> >     the jobs will do? Without knowing any details, I think a control
>>> kafka
>>> >     topic which contains the "job creation/cancellation requests" of
>>> the
>>> >     users in combination with a comap-operator is the better solution
>>> here.
>>> >     You could keep the currently active "jobs" as state in the comap
>>> and and
>>> >     emit one record of the original stream per active user-job
>>> together with
>>> >     some indicator on how to process it based on the request. What are
>>> your
>>> >     concerns with respect to insight in the process? I think with some
>>> nice
>>> >     accumulators you could get a good idea of what is going on, on the
>>> other
>>> >     hand if I think about monitoring 1000s of jobs I am actually not so
>>> >     sure ;)
>>> >
>>> >     Cheers,
>>> >
>>> >     Konstantin
>>> >
>>> >     On 22.03.2016 19 <tel:22.03.2016%2019>:16, David Brelloch wrote:
>>> >     > Hi all,
>>> >     >
>>> >     > We are currently evaluating flink for processing kafka messages
>>> and are
>>> >     > running into some issues. The basic problem we are trying to
>>> solve is
>>> >     > allowing our end users to dynamically create jobs to alert based
>>> off the
>>> >     > messages coming from kafka. At launch we figure we need to
>>> support at
>>> >     > least 15,000 jobs (3000 customers with 5 jobs each). I have the
>>> example
>>> >     > kafka job running and it is working great. The questions I have
>>> are:
>>> >     >
>>> >     >  1. On my local machine (admittedly much less powerful than we
>>> >     would be
>>> >     >     using in production) things fall apart once I get to around
>>> 75 jobs.
>>> >     >     Can flink handle a situation like this where we are looking
>>> at
>>> >     >     thousands of jobs?
>>> >     >  2. Is this approach even the right way to go? Is there a
>>> different
>>> >     >     approach that would make more sense? Everything will be
>>> listening to
>>> >     >     the same kafka topic so the other thought we had was to have
>>> 1 job
>>> >     >     that processed everything and was configured by a separate
>>> control
>>> >     >     kafka topic. The concern we had there was we would almost
>>> completely
>>> >     >     lose insight into what was going on if there was a slow down.
>>> >     >  3. The current approach we are using for creating dynamic jobs
>>> is
>>> >     >     building a common jar and then starting it with the
>>> configuration
>>> >     >     data for the individual job. Does this sound reasonable?
>>> >     >
>>> >     >
>>> >     > If any of these questions are answered elsewhere I apologize. I
>>> >     couldn't
>>> >     > find any of this being discussed elsewhere.
>>> >     >
>>> >     > Thanks for your help.
>>> >     >
>>> >     > David
>>> >
>>> >     --
>>> >     Konstantin Knauf * konstantin.knauf@tngtech.com
>>> >     <mailto:konstantin.knauf@tngtech.com> * +49-174-3413182
>>> >     <tel:%2B49-174-3413182>
>>> >     TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>> >     Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>> >     Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>> >
>>> >
>>>
>>> --
>>> Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>
>>
>>
>
>
> --
>
> Jamie Grier
> data Artisans, Director of Applications Engineering
> @jamiegrier <https://twitter.com/jamiegrier>
> jamie@data-artisans.com
>
>

Mime
View raw message