giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo" <majakabi...@fb.com>
Subject Re: Review Request 23989: Improve checkpointing
Date Fri, 15 Aug 2014 17:42:54 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/#review50747
-----------------------------------------------------------

Ship it!


Ship It!

- Maja Kabiljo


On Aug. 15, 2014, 5:37 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23989/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2014, 5:37 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> We need to address some issues with checkpointing:
> 1) worker2worker messages are not saved
> 2) BspServiceWorker does not compile under hadoop_0.23 profile
> 3) it would be nice to be able to manually checkpoint and stop any job at any point of
time.
> 
> Changes:
> 
> 1) worker2worker messages fixed my serializing currentworkertoworker messages (it is
a list of writable so I had to write class information as well)
> 2) Compilation issues fixed
> 3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node
in zookeeper (same way as _haltComputation works) After that the behavior of the job will
be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at
the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint()
return true, in this case job will be restarted immediately after getting checkpointed.
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3

>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1

>   giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47 
>   giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778

>   giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002 
>   giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c

>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e

>   giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210 
>   giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee 
>   giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION

>   giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7 
>   pom.xml 672ec44 
> 
> Diff: https://reviews.apache.org/r/23989/diff/
> 
> 
> Testing
> -------
> 
> Run pagerank, will keep testing with different jobs.
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message