giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-297) Checkpointing on master is done one superstep later
Date Mon, 13 Aug 2012 19:00:40 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433426#comment-13433426
] 

Maja Kabiljo commented on GIRAPH-297:
-------------------------------------

The easiest way here would be to switch the execution order to first vertices then master
- this would make all of it consistent, and it makes just as much sense as the other order
does to me. However, if we want to keep the current order I think we should make some changes
in the implementation. It's true that this code is not something user should look at, but
still it might cause unwanted mistakes by us while making changes in other parts. Will think
about the implementation a bit more and comment then.

In the meantime, this whole ordering thing won't fix the problem from this issue. We want
master to call finalizeCheckpoint after all the workers have written their checkpoint data,
but we have no indication of when workers did that, and also we call it after master.compute
for next superstep which makes it wrong. I attached the patch which fixes it - it adds another
barrier there, which isn't really a big deal since master would be waiting for workers to
finish at that point of time anyway. Apart from fixing what was incorrect, I think this is
better since we don't have to wait for computation for superstep X to finish before making
final checkpoint for superstep X-1.
                
> Checkpointing on master is done one superstep later
> ---------------------------------------------------
>
>                 Key: GIRAPH-297
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-297
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-297.patch
>
>
> On workers we store checkpoint X before compute() for superstep X are executed. On master
we do it after those compute() are executed and after master.compute() for superstep X+1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message