flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wouter Zorgdrager <zorgdrag...@gmail.com>
Subject Re: Statefun 2.0 questions
Date Wed, 13 May 2020 09:55:17 GMT
Dear Igal, all,

Thanks a lot. This is very helpful. I understand the architecture a bit
more now. We can just scale the stateful functions and put a load balancer
in front and Flink will contact them. The only part of the scaling I don't
understand yet is how to scale the 'Flink side'. So If I understand
correctly the Kafka ingress/egress parts runs on the Flink cluster and
contacts the remote workers through HTTP. How can I scale this Kafka part
then? For a normal Flink job I would just change the parallelism, but I
couldn't really find that option yet. Is there some value I need to set in
the module.yaml.

Once again, thanks for the help so far. It has been useful.


Op wo 13 mei 2020 om 00:03 schreef Igal Shilman <igal@ververica.com>:

> Hi Wouter,
> Triggering a stateful function from a frontend indeed requires an ingress
> between them, so the way you've approached this is also the way we were
> thinking of.
> As Gordon mentioned a potential improvement might be an HTTP ingress, that
> would allow triggering stateful functions directly from the front end
> servers.
> But this kind of ingress is not implemented yet.
> Regarding scaling: Your understanding is correct, you can scale both the
> Flink cluster and the remote "python-stateful-function" cluster
> independently.
> Scaling the Flink cluster, tho, requires taking a savepoint, bumping the
> job parallelism, and starting the cluster with more workers from the
> savepoint taken previously.
> Scaling "python-stateful-function" workers can be done transparently to
> the Flink cluster, but the exact details are deployment specific.
> - For example the python workers are a k8s service.
> - Or the python workers are deployed behind a load balancer
> - Or you add new entries to the DNS record of your python worker.
> I didn't understand "ensuring that it ends op in the correct Flink job"
> can you please clarify?
> Flink would be the one contacting the remote workers and not the other way
> around. So as long as the new instances
> are visible to Flink they would be reached with the same shared state.
> I'd recommend watching [1] and the demo at the end, and [2] for a demo
> using stateful functions on AWS lambda.
> [1] https://youtu.be/NF0hXZfUyqE
> [2] https://www.youtube.com/watch?v=tuSylBadNSo
> It seems like you are on the correct path!
> Good luck!
> Igal.
> On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager <zorgdragerw@gmail.com>
> wrote:
>> Hi Igal, all,
>> In the meantime we found a way to serve Flink stateful functions in a
>> frontend. We decided to add another (set of) Flask application(s) which
>> link to Kafka topics. These Kafka topics then serve as ingress and egress
>> for the statefun cluster. However, we're wondering how we can scale this
>> cluster. On the documentation page some nice figures are provided for
>> different setups but no implementation details are given. In our case we
>> are using a remote cluster so we have a Docker instance containing the
>> `python-stateful-function` and of course the Flink cluster containing a
>> `master` and `worker`. If I understood correctly, in a remote setting, we
>> can scale both the Flink cluster and the `python-stateful-function`.
>> Scaling the Flink cluster is trivial because I can add just more
>> workers/task-managers (providing more taskslots) just by scaling the worker
>> instance. However, how can I scale the stateful function also ensuring that
>> it ends op in the correct Flink job (because we need shared state there). I
>> tried scaling the Docker instance as well but that didn't seem to work.
>> Hope you can give me some leads there.
>> Thanks in advance!
>> Kind regards,
>> Wouter
>> Op do 7 mei 2020 om 17:17 schreef Wouter Zorgdrager <
>> zorgdragerw@gmail.com>:
>>> Hi Igal,
>>> Thanks for your quick reply. Getting back to point 2, I was wondering if
>>> you could trigger indeed a stateful function directly from Flask and also
>>> get the reply there instead of using Kafka in between. We want to
>>> experiment running stateful functions behind a front-end (which should be
>>> able to trigger a function), but we're a bit afraid that using Kafka
>>> doesn't scale well if on the frontend side a user has to consume all Kafka
>>> messages to find the correct reply/output for a certain request/input. Any
>>> thoughts?
>>> Thanks in advance,
>>> Wouter
>>> Op do 7 mei 2020 om 10:51 schreef Igal Shilman <igal@ververica.com>:
>>>> Hi Wouter!
>>>> Glad to read that you are using Flink for quite some time, and also
>>>> exploring with StateFun!
>>>> 1) yes it is correct and you can follow the Dockerhub contribution PR
>>>> at [1]
>>>> 2) I’m not sure I understand what do you mean by trigger from the
>>>> browser.
>>>> If you mean, for testing / illustration purposes triggering the
>>>> function independently of StateFun, you would need to write some JavaScript
>>>> and preform the POST (assuming CORS are enabled)
>>>> Let me know if you’d like getting further information of how to do it.
>>>> Broadly speaking, GET is traditionally used to get data from a resource
>>>> and POST to send data (the data is the invocation batch in our case).
>>>> One easier walk around for you would be to expose another endpoint in
>>>> your Flask application, and call your stateful function directly from there
>>>> (possibly populating the function argument with values taken from the query
>>>> params)
>>>> 3) I would expect a performance loss when going from the embedded SDK
>>>> to the remote one, simply because the remote function is at a different
>>>> process, and a round trip is required. There are different ways of
>>>> deployment even for remote functions.
>>>> For example they can be co-located with the Task managers and
>>>> communicate via the loop back device /Unix domain socket, or they can be
>>>> deployed behind a load balancer with an auto-scaler, and thus reacting to
>>>> higher request rate/latency increases by spinning new instances (something
>>>> that is not yet supported with the embedded API)
>>>> Good luck,
>>>> Igal.
>>>> [1] https://github.com/docker-library/official-images/pull/7749
>>>> On Wednesday, May 6, 2020, Wouter Zorgdrager <zorgdragerw@gmail.com>
>>>> wrote:
>>>>> Hi all,
>>>>> I've been using Flink for quite some time now and for a university
>>>>> project I'm planning to experiment with statefun. During the walkthrough
>>>>> I've run into some issues, I hope you can help me with.
>>>>> 1) Is it correct that the Docker image of statefun is not yet
>>>>> published? I couldn't find it anywhere, but was able to run it by building
>>>>> the image myself.
>>>>> 2) In the example project using the Python SDK, it uses Flask to
>>>>> expose a function using POST. Is there also a way to serve GET request
>>>>> that you can trigger a stateful function by for instance using your
>>>>> browser?
>>>>> 3) Do you expect a lot of performance loss when using the Python SDK
>>>>> over Java?
>>>>> Thanks in advance!
>>>>> Regards,
>>>>> Wouter

View raw message