giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?
Date Wed, 12 Sep 2012 08:54:07 GMT


Maja Kabiljo commented on GIRAPH-293:

Thank you for looking!

The deal with the patch was to make checkpointing work, and I separated aggregator code as
a bonus :-)

For the code duplication you mean the parts which read/write from ZooKeeper? I didn't pay
much attention to making those parts nice, since they are going away soon. I wanted to minimize
the change and make it as easy to review as possible, so you can see that those parts are
really just copied directly from BspService classes. That's why I keep saying the patch is
much smaller than it looks like. The differences between worker and master code are for example
that one writes just aggregator names and values while the other also writes aggregator classnames
(the opposite for reading); worker just reads final values while master reads values from
all workers and aggregates them along the way. 
> Should aggregators be checkpointed?
> -----------------------------------
>                 Key: GIRAPH-293
>                 URL:
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
> As I understand, we don't include aggregators in checkpoints because they are kept in
the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves
starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should
be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are
always reset at each superstep. None of these is happening, but the error cancels out with
the fact that we are not actually resuming from a checkpoint, but re-running the job from

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message