giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-267) Jobs can get killed for not reporting status during INPUT SUPERSTEP
Date Thu, 26 Jul 2012 15:21:35 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423143#comment-13423143
] 

Eli Reisman commented on GIRAPH-267:
------------------------------------

Sorry, I'll explain better, I was in the middle of two conversations at once last night! I
meant the other patch didn't publish context into new parts of the code, or put the context
inside waitMsec inside waitForever. It kept the fix localized to BspServiceWorker during INPUT_SUPERSTEP
as this was where the problem happened, no matter how much data I shoveled in at the beginning.
I explicitly called waitMsec there so all the progress calls were in one place and you could
see where and how often they were needed. I had been asked repeatedly why progress calls were
needed at all, and it was hard to understand the need or placement of the calls for folks
who had not run into this problem yet. Once the load in was done, I never saw supersteps take
very long so it seemed self contained. I'm surprised no committers left me a comment if that
solution was unsavory, I agree this cuts down on repetition in the code! This is a great fix,
nice work Jaeho!

                
> Jobs can get killed for not reporting status during INPUT SUPERSTEP
> -------------------------------------------------------------------
>
>                 Key: GIRAPH-267
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-267
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 0.2.0
>         Environment: Facebook Hadoop
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>             Fix For: 0.2.0
>
>         Attachments: 0001-Made-PredicateLock-report-progress-and-removed-Conte.patch,
GIRAPH-267.patch, GIRAPH-267.patch
>
>
> Job with a skewed and long (>600secs in my case) INPUT_SUPERSTEP fails for some tasks
not reporting their status.  From BspServiceWorker#setup(), I could tell while some workers
were still loading inputSplits, others finished theirs early and hanged on PredicateLock#waitForever(),
and got killed after the timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message