flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Metzger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3003) Add container allocation timeout to YARN CLI
Date Mon, 14 Dec 2015 10:11:46 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055746#comment-15055746

Robert Metzger commented on FLINK-3003:

Gang scheduling support in YARN would be the right way to solve this issue, but it seems that
it will take some time until they even decide to implement it: https://issues.apache.org/jira/browse/YARN-624

> Add container allocation timeout to YARN CLI
> --------------------------------------------
>                 Key: FLINK-3003
>                 URL: https://issues.apache.org/jira/browse/FLINK-3003
>             Project: Flink
>          Issue Type: Improvement
>          Components: YARN Client
>    Affects Versions: 0.10.0
>            Reporter: Ufuk Celebi
>             Fix For: 1.0.0
> Programs submitted via {{bin/flink run -m yarn-cluster}} start a short-lived YARN sessions
before submitting the job. The job is only submitted when all resources have been allocated.
All allocated containers are "blocked" by the to be submitted job and the cluster is only
partially allocated.
> If you have multiple submissions like this with partial allocations, you can block the
whole YARN cluster (e.g. 10 containers in total and two sessions want 6 containers each and
both have allocated 5).
> A simple work around for these situations is to add an allocation timeout after which
the YARN sessions fails and releases all the resources.
> [Other strategies like wait for X amount of time for Y containers, but then go with what
you have if you don't get all are also possible.]

This message was sent by Atlassian JIRA

View raw message