flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ufuk Celebi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4021) Problem of setting autoread for netty channel when more tasks sharing the same Tcp connection
Date Wed, 17 Aug 2016 09:12:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424179#comment-15424179

Ufuk Celebi commented on FLINK-4021:

After you address the comments (replace return and add the test) and add another commit to
the pull request branch, I would go ahead and merge this. :-) If you push to the PR branch,
it will be automatically reflected in the PR on GitHub.

> Problem of setting autoread for netty channel when more tasks sharing the same Tcp connection
> ---------------------------------------------------------------------------------------------
>                 Key: FLINK-4021
>                 URL: https://issues.apache.org/jira/browse/FLINK-4021
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.0.2
>            Reporter: Zhijiang Wang
>            Assignee: Zhijiang Wang
> More than one task sharing the same Tcp connection for shuffling data.
> If the downstream task said as "A" has no available memory segment to read netty buffer
from network, it will set autoread as false for the channel.
> When the task A is failed or has available segments again, the netty handler will be
notified to process the staging buffers first, then reset autoread as true. But in some scenarios,
the autoread will not be set as true any more.
> That is when processing staging buffers, first find the corresponding input channel for
the buffer, if the task for that input channel is failed, the decodeMsg method in PartitionRequestClientHandler
will return false, that means setting autoread as true will not be done anymore.
> In summary,  if one task "A" sets the autoread as false because of no available segments,
and resulting in some staging buffers. If another task "B" is failed by accident corresponding
to one staging buffer. When task A trys to reset autoread as true, the process can not work
because of task B failed.
> I have fixed this problem in our application by adding one boolean parameter in decodeBufferOrEvent
method to distinguish whether this method is invoke by netty IO thread channel read or staged
message handler task in PartitionRequestClientHandler.

This message was sent by Atlassian JIRA

View raw message