flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1908) JobManager startup delay isn't considered when using start-cluster.sh script
Date Mon, 20 Apr 2015 13:35:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502792#comment-14502792
] 

ASF GitHub Bot commented on FLINK-1908:
---------------------------------------

Github user DarkKnightCZ commented on a diff in the pull request:

    https://github.com/apache/flink/pull/609#discussion_r28688289
  
    --- Diff: flink-dist/src/main/flink-bin/bin/start-cluster.sh ---
    @@ -37,6 +37,26 @@ fi
     # cluster mode, bring up job manager locally and a task manager on every slave host
     "$FLINK_BIN_DIR"/jobmanager.sh start cluster
     
    +# wait until jobmanager starts
    +JOBMANAGER_ADDR=$(readFromConfig ${KEY_JOBM_RPC_ADDR} "${DEFAULT_JOBM_RPC_ADDR}" "${YAML_CONF}")
    +JOBMANAGER_PORT=$(readFromConfig ${KEY_JOBM_RPC_PORT} "${DEFAULT_JOBM_RPC_PORT}" "${YAML_CONF}")
    +
    +echo "Waiting for job manager"
    +for i in {1..30}; do
    +  nc -z "${JOBMANAGER_ADDR}" $JOBMANAGER_PORT
    --- End diff --
    
    @rmetzger 
    Since "-z" does only port-pinging (no actual payload is sent), nothing is visible in logs
(if you send some data, its correctly logged as WARN "incorrect header" by org.apache.flink.runtime.ipc.Server)


> JobManager startup delay isn't considered when using start-cluster.sh script
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-1908
>                 URL: https://issues.apache.org/jira/browse/FLINK-1908
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>    Affects Versions: 0.9, 0.8.1
>         Environment: Linux
>            Reporter: Lukas Raska
>            Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> When starting Flink cluster via start-cluster.sh script, JobManager startup can be delayed
(as it's started asynchronously), which can result in failed startup of several task managers.
> Solution is to wait certain amount of time and periodically check if RPC port is accessible,
then proceed with starting task managers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message