flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3927) TaskManager registration may fail if Yarn versions don't match
Date Thu, 19 May 2016 16:46:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291484#comment-15291484
] 

ASF GitHub Bot commented on FLINK-3927:
---------------------------------------

GitHub user mxm opened a pull request:

    https://github.com/apache/flink/pull/2013

    [FLINK-3927][yarn] make container id consistent across Hadoop versions

    Fixes a bug where the container id generation would vary across Hadoop versions of the
client/cluster. The ResourceManager assumes a persistent resource id.
    
    Based on #2012, to re-enable the Yarn tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mxm/flink FLINK-3927

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2013
    
----
commit 422e078c93b558dba3d0c6a53643824198e2c545
Author: Maximilian Michels <mxm@apache.org>
Date:   2016-05-19T12:29:12Z

    [FLINK-3927][yarn] make container id consistent across Hadoop versions
    
    - introduce a unique container id independent of the Hadoop version
    - improve printing of exceptions during registration
    - minor improvements to the Yarn ResourceManager code

commit c27fc8553f4dc0fbcee09c52848477cff2de0b11
Author: Maximilian Michels <mxm@apache.org>
Date:   2016-05-19T15:59:23Z

    [FLINK-3938] re-enable Yarn tests
    
    As of 70978f560fa5cab6d84ec27d58faa2627babd362, the Yarn tests were not
    executed anymore. They were moved to the test directory but there was
    still a Maven configuration in place to change the test directory.

----


> TaskManager registration may fail if Yarn versions don't match
> --------------------------------------------------------------
>
>                 Key: FLINK-3927
>                 URL: https://issues.apache.org/jira/browse/FLINK-3927
>             Project: Flink
>          Issue Type: Bug
>          Components: ResourceManager
>    Affects Versions: 1.1.0
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>             Fix For: 1.1.0
>
>
> Flink's ResourceManager uses the Yarn container ids to identify connecting task managers.
Yarn's stringified container id may not be consistent across different Hadoop versions, e.g.
Hadoop 2.3.0 and Hadoop 2.7.1. The ResourceManager gets it from the Yarn reports while the
TaskManager infers it from the Yarn environment variables. The ResourceManager may use Hadoop
2.3.0 version while the cluster runs Hadoop 2.7.1. 
> The solution is to pass the ID through a custom environment variable which is set by
the ResourceManager before launching the TaskManager in the container. That way we will always
use the Hadoop client's id generation method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message