flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudia Wegmann <c.wegm...@kasasi.de>
Subject dynamic streams and patterns
Date Mon, 11 Jul 2016 07:39:25 GMT
Hey everyone,

I'm quite new to Apache Flink. I'm trying to build a system with Flink and wanted to hear
your opinion and whether the proposed architecture is even possible with Flink. The environment
for the system will be a microservice architecture handling messaging via async events.

I want to give you a brief description of the system:

-          there are a lot of sensors, which each produces a stream of data

-          on each stream of sensor data I want to match one or more patterns via Flink's
CEP library

-          each of these sensors belongs to one or more administrative entities

-          each pattern belongs to one administrative entity and needs to be evaluated on
one or more sensors of this entity

-          the user can change the connection of a sensor to an administrative entity as well
as the sensors on which a pattern needs to be evaluated


I hope this description is enough to give you an overview of the system.

This is what I am thinking of doing:

-          I will have an Apache Kafka cluster and a Flink cluster running inside docker containers

-          I create a topic in Kafka for each administrative entity

-          for each entity I create a Flink job which consumes the corresponding topic

-          the Flink job creates a stream of the sensor data

-          it splits the stream to a stream for each sensor

-          for each pattern that hast to be evaluated on one stream I create a pattern stream


This results in the following:

-          there will be a lot of Kafka topics

-          for each topic there will be one Flink job (-> a lot of jobs, too)

-          in each job there will be quite a lot of streams and patterns and therefore even
more pattern streams


The main questions that arose while thinking of this implementation:

1.)    From other questions here, I know that there is currently no way to dynamically add
taskmanagers to the Flink cluster. The proposed way to handle that, is to start up much more
taskmanagers than first needed. Is it even possible to have a great number of jobs on one
cluster?

2.)    Would a viable alternative be to just dynamically start up a new cluster for each administrative
entity?

3.)    I also came to know, that Flink isn't able to handle dynamically created streams and
patterns. I guess that is due to the fixed calculation of the execution graph at the jobs
beginning. Is there a way to make Flink recalculate the graph of a running job? I also just
found out about this [1] example, where they use scripts to hot deploy queries. I will look
into that, too. Maybe that provides an acceptable solution for me, too.

4.)    Is it even possible to have a great number of streams and patterns in one Flink job?


Any comments and feedback are greatly appreciated.
Thanks a lot in advance :)
Best, Claudia

[1]: https://techblog.king.com/rbea-scalable-real-time-analytics-king/


Mime
View raw message