flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1352) Buggy registration from TaskManager to JobManager
Date Mon, 05 Jan 2015 16:34:35 GMT
Stephan Ewen created FLINK-1352:

             Summary: Buggy registration from TaskManager to JobManager
                 Key: FLINK-1352
                 URL: https://issues.apache.org/jira/browse/FLINK-1352
             Project: Flink
          Issue Type: Bug
          Components: JobManager, TaskManager
    Affects Versions: 0.9
            Reporter: Stephan Ewen
            Assignee: Till Rohrmann
             Fix For: 0.9

The JobManager's InstanceManager may refuse the registration attempt from a TaskManager, because
it has this taskmanager already connected, or,in the future, because the TaskManager has been
blacklisted as unreliable.

Unpon refused registration, the instance ID is null, to signal that refused registration.
TaskManager reacts incorrectly to such methods, assuming successful registration

Possible solution: JobManager sends back a dedicated "RegistrationRefused" message, if the
instance manager returns null as the registration result. If the TastManager receives that
before being registered, it knows that the registration response was lost (which should not
happen on TCP and it would indicate a corrupt connection)

Followup question: Does it make sense to have the TaskManager trying indefinitely to connect
to the JobManager. With increasing interval (from seconds to minutes)?

This message was sent by Atlassian JIRA

View raw message