flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Yao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-8488) Dispatcher does not recover jobs
Date Tue, 23 Jan 2018 09:35:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gary Yao updated FLINK-8488:
----------------------------
         Labels: flip-6  (was: )
    Description: 
Dispatcher does not recover jobs on failover (FLIP-6).

*Steps to reproduce*:
 # {{bin/start-cluster.sh flip6}}
 # bin/flink run -p1 -flip6 examples/batch/WordCount.jar --input /path/to/largefile.txt
 # Wait until job is running then run {{bin/jobmanager.sh stop flip6 && bin/jobmanager.sh
start flip6}}
 # Wait until leader is elected and verify that no jobs are running.

*Analysis*
 * Dispatcher checks on {{submitJob}} whether the job scheduling status is {{PENDING}} and
only then allows resubmission of the job. However, the job is marked as {{RUNNING}} in ZooKeeper.

  was:
Dispatcher does not recover jobs on failover.

*Steps to reproduce*:
# {{bin/start-cluster.sh flip6}}
# bin/flink run -p1 -flip6 examples/batch/WordCount.jar --input /path/to/largefile.txt
# Wait until job is running then run {{bin/jobmanager.sh stop flip6 && bin/jobmanager.sh
start flip6}}
# Wait until leader is elected and verify that no jobs are running.

*Analysis*
* Dispatcher checks on {{submitJob}} whether the job scheduling status is {{PENDING}} and
only then allows resubmission of the job. However, the job is marked as {{RUNNING}} in ZooKeeper.


> Dispatcher does not recover jobs
> --------------------------------
>
>                 Key: FLINK-8488
>                 URL: https://issues.apache.org/jira/browse/FLINK-8488
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>         Environment: 776af4a882c85926fc0764b702fec717c675e34c
>            Reporter: Gary Yao
>            Priority: Blocker
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> Dispatcher does not recover jobs on failover (FLIP-6).
> *Steps to reproduce*:
>  # {{bin/start-cluster.sh flip6}}
>  # bin/flink run -p1 -flip6 examples/batch/WordCount.jar --input /path/to/largefile.txt
>  # Wait until job is running then run {{bin/jobmanager.sh stop flip6 && bin/jobmanager.sh
start flip6}}
>  # Wait until leader is elected and verify that no jobs are running.
> *Analysis*
>  * Dispatcher checks on {{submitJob}} whether the job scheduling status is {{PENDING}}
and only then allows resubmission of the job. However, the job is marked as {{RUNNING}} in
ZooKeeper.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message