flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3184) Decrease Akka timeouts on cluster side to make system more responsive
Date Fri, 18 Dec 2015 13:33:46 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063947#comment-15063947

ASF GitHub Bot commented on FLINK-3184:

Github user uce commented on the pull request:

    I've created https://docs.google.com/document/d/1987ydc2rez79Pph7qBbcXwu6XzMU2Hm-nX8kORLoZBM/edit?usp=sharing
and added the config renaming as an API breaking change. I will share this list on the ML.

> Decrease Akka timeouts on cluster side to make system more responsive
> ---------------------------------------------------------------------
>                 Key: FLINK-3184
>                 URL: https://issues.apache.org/jira/browse/FLINK-3184
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
> Currently, the default timeout for futures is set to 100 s. This also the time used to
wait in between restart attempts if no other value has been explicitly specified. Especially
in the streaming case, it is often necessary to detect failures and to react to failures in
shorter period than 100 s. Therefore, I propose to decrease the default timeout to 10 s.
> Additionally, I propose to introduce a slightly higher timeout for the client side (e.g.
60 s). The reason is that in case of a {{JobManager}} the client has to wait until the cluster
has recovered. Using ZooKeeper for that can entail a longer timeout than 10 s. In such a case
a recovery could be falsely recognized as a lost connection.

This message was sent by Atlassian JIRA

View raw message