flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: Running multiple streaming jobs in same cluster
Date Mon, 13 Feb 2017 05:38:38 GMT
Hi Ozan,

From your description, it seems like your original huge job can be broken down into smaller
disconnected graphs, with only some of the graphs requiring checkpointing / snapshots.
In general, it would be a good practice to split disconnected graphs of the execution graph
into multiple jobs, so that the checkpointing for each disconnected graph is coordinated independently.

I don’t expect a problem with 100+ jobs in one cluster, but it might be notable to keep
in mind resource usage for bookkeeping and coordination in the JobManager. With Flink's current
process model, the JobManager handles multiple jobs, so it is essentially a bottleneck to
consider. There is already ongoing work with FLIP-6 to improve Flink’s process model, one
of them being to change to a single JM per job. If you’re interested in it, you can check
it out here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077


On February 10, 2017 at 4:40:18 PM, Ozan DENİZ (ozandeniz@outlook.com) wrote:

Hi everyone,  

We have a huge execution graph for one streaming job. To update this execution graph, we take
the snapshot of the job and start the job with snapshot. However this can take too much time.

One option is splitting this huge streaming job into smaller ones. We can cancel or run new
stream jobs (without taking snapshot) instead updating the huge one I explained above. However
we will end up having 100 - 150 small streaming jobs in one cluster.  

My question is;  

Is it a good practice to run multiple streaming jobs (above 100) in one cluster?  



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message