giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Edunov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-1077) Jobs getting stuck after channel failure
Date Fri, 14 Oct 2016 00:46:20 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Edunov updated GIRAPH-1077:
----------------------------------
    Fix Version/s: 1.2.0

> Jobs getting stuck after channel failure
> ----------------------------------------
>
>                 Key: GIRAPH-1077
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1077
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>             Fix For: 1.2.0
>
>
> When a channel fails currently we just log the failure. Since we don't wait on open requests
from every place, checking requests doesn't get called always, and we've seen issues with
jobs staying stuck, for example during the input stage when request for split to read from
worker to master fails. When we know that channel failed, we should try to resend the requests
from that channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message