giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?
Date Wed, 15 Aug 2012 14:28:38 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Maja Kabiljo updated GIRAPH-293:
--------------------------------

    Attachment: GIRAPH-293.patch

Making aggregators work correctly with checkpointing - saving the aggregator name, class,
value and whether it's persistent. Apart from that, I removed the code for aggregators handling
from BspServiceWorker and BspServiceMaster to separate classes, since I think it's cleaner
this way, and those two classes do too much different stuff as it is. But that's the reason
why the patch looks big. Later with GIRAPH-273 AggregatorHandler classes should become more
independent of BspServices.

I added test for aggregator serialization and manual restarting from checkpoint (that one
also relies on recent GIRAPH-296 and GIRAPH-298 working). The patch passes mvn verify and
tests in pseudo-distributed mode.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in
the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves
starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should
be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are
always reset at each superstep. None of these is happening, but the error cancels out with
the fact that we are not actually resuming from a checkpoint, but re-running the job from
scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message