giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo" <majakabi...@fb.com>
Subject Re: Review Request 23989: Improve checkpointing
Date Thu, 31 Jul 2014 04:20:40 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/#review49201
-----------------------------------------------------------



giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java
<https://reviews.apache.org/r/23989/#comment86080>

    How did you verify this is correct now, I remember you saying hadoop_facebook has both
methods but one returns null, can that be the case for some other versions of hadoop?


- Maja Kabiljo


On July 28, 2014, 5:25 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23989/
> -----------------------------------------------------------
> 
> (Updated July 28, 2014, 5:25 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> We need to address some issues with checkpointing:
> 1) worker2worker messages are not saved
> 2) BspServiceWorker does not compile under hadoop_0.23 profile
> 3) it would be nice to be able to manually checkpoint and stop any job at any point of
time.
> 
> Changes:
> 
> 1) worker2worker messages fixed my serializing currentworkertoworker messages (it is
a list of writable so I had to write class information as well)
> 2) Compilation issues fixed
> 3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node
in zookeeper (same way as _haltComputation works) After that the behavior of the job will
be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at
the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint()
return true, in this case job will be restarted immediately after getting checkpointed.
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3

>   giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 29488fc 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 684f4eb 
>   giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c

>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e

>   giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210 
>   giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 3f8382e 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee 
>   giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION

>   giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7 
>   pom.xml de25499 
> 
> Diff: https://reviews.apache.org/r/23989/diff/
> 
> 
> Testing
> -------
> 
> Run pagerank, will keep testing with different jobs.
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message