flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wright, Eron" <ewri...@live.com>
Subject Re: [QUESTION] thread model in Flink makes me confused
Date Wed, 11 May 2016 23:58:57 GMT
One option is to use a separate cluster (JobManager + TaskManagers) for each job.   This is
fairly straightforward with the YARN support - "flink run” can launch a cluster for a job
and tear it down afterwards.

Of course this means you must deploy YARN.   That doesn’t necessarily imply HDFS though
a Hadoop-compatible filesystem (HCFS) is needed to support the YARN staging directory. 

This approach also facilitates richer scheduling and multi-user scenarios.   

One downside is the loss of a unified web UI to view all jobs.

> On May 11, 2016, at 8:32 AM, Jark Wu <wuchong.wc@alibaba-inc.com> wrote:
> As I know, Flink uses thread model, that means one TaskManager process may run many different
operator threads from different jobs. So tasks from different jobs will compete for memory
and CPU in the one process. In the worst case scenario, the bad job will eat most of CPU and
memroy which may lead to OOM, and then the regular job died too. And there's another problem,
tasks from different jobs will print there logs into the same file(the taskmanager log file).
This increases the difficulty of debugging.
> As I know, Storm will spawn workers for every job. The tasks in one worker belong to
the same job. So I'm confused the purpose or advantages of Flink design. One more question,
is there any tips to solves the issues above? Or any suggestions to implemention the similar
desgin with Storm ? 
> Thank you for any answers in advance!
> Regards,
> Jark Wu

View raw message