giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?
Date Tue, 11 Sep 2012 18:20:08 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453252#comment-13453252
] 

Eli Reisman commented on GIRAPH-293:
------------------------------------

Looks great, and a lot of work too.

So the deal with this patch is to separate out the aggregator handling code into different
modules but still operating at this stage on zookeeper?

There is definitely code duplication in the master/worker handlers is this needed or will
all this be changing in the handler modules as we move to network connections and away from
zk? What exactly is the difference for the master and worker handling code? Could there be
a common base class handler that implements the common functions worker and master handlers
need? Or is the difference hard to factor out?

Anyone else have any particular problems with this code or can we commit this?

                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in
the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves
starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should
be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are
always reset at each superstep. None of these is happening, but the error cancels out with
the fact that we are not actually resuming from a checkpoint, but re-running the job from
scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message