flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-9662) Task manager isolation for jobs
Date Wed, 27 Jun 2018 09:59:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524851#comment-16524851

Till Rohrmann commented on FLINK-9662:

Thanks for drafting the design document [~liurenjie1024]. 

I was wondering whether we could achieve the same by only introducing a set of tags with which
we could start a {{TaskManager}}. When a {{TaskManager}} registers a slot, it would also report
for each slot the set of tags. 

Next we could introduce that a {{JobManager}} can request a slot with a certain set of tags
(basically making the flags an additional constraint for scheduling). If such a slot does
not exist, then it should request a new TaskManager with this set of tags.

As a last step, we could introduce that upon activating job isolation each slot requested
by the {{JobManager}} will have the {{JobID}} as a tag.

The benefit of this approach would be that it is a bit more flexible because you could also
define groups of jobs which are allowed to share resources. Moreover, it would allow to have
heterogenous TaskManagers, some of which can execute jobs which need a GPU for example. In
such a case a job which requires GPU support, would simply require the GPU tag.

What do you think?

> Task manager isolation for jobs
> -------------------------------
>                 Key: FLINK-9662
>                 URL: https://issues.apache.org/jira/browse/FLINK-9662
>             Project: Flink
>          Issue Type: New Feature
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Renjie Liu
>            Assignee: Renjie Liu
>            Priority: Major
>             Fix For: 1.6.0
> Disable task manager sharing for different jobs.

This message was sent by Atlassian JIRA

View raw message