incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: [Discussion] Refactor bsp() for recovery procedure
Date Mon, 19 Sep 2011 05:55:07 GMT
Hi ChiaHung,

I would not split this into several classes like SuperStep1 or SuperStep2
and the chaining sounds a bit strange to me.
But, what I think your idea is cool, the BSPSuperstep class is starting
after a sync phase and is ending with it (easier for the user, because the
workflow is simpler).

Here is my proposal:

BSPSuperstep step;
> int rollbackSuperStep = -1;
> if((rollbackSuperStep = conf.getInt(bsp.rollback.superstep) ) > -1)[
>    step = BSPSuperstep.getSuperStep(rollbackSuperStep);
> }
> while(!notHalted){
>    sync();
>    step = new BSPSuperstep(CURRENT_NUMBER_OF_SUPERSTEP);
>    step.compute(List<Message> list);
>    save(step);
>    notHalted = checkHalted();
> }
>

I know that diverges alot from your idea. Maybe you have to put the sync
into the tail of the loop.
But what do you think on that?

2011/9/19 ChiaHung Lin <chl501@nuk.edu.tw>
>
> Currently we have bsp() where users can code for performing thier tasks.
For instance,
>
> ... bsp() ...{
>   ... // some computation
>   sync();
>   ... // some other computation
>   sync();
>   ...
> }
>
> However, this is difficult for recovery because 1st, it requires
checkpointed messages to be recovered so that the computation can be resumed
from where it fails; 2nd, the recovery procedure needs to know from which
super step to restart. With the current bsp(), it seems a common choice is
preprocessing; but this may not be good because when internally something
goes wrong it, it is not easy to find out the problem.
>
> I come up with an alternative method but this would have change to the way
of our current procedure. So I think it would be good to discuss it first.
It is proposed as below:
>
> 1. we divide bsp() into smaller computation unit called e.g. step() or
superstep(), within which user still write their own logic.
>
> 2. in main, user composes the order of supersteps.
>
> ... class Superstep1 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
> ... class Superstep2 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
>
> BSPJob bsp = new BSP(...);
> bsp.compose(Superstep1.class).compose(Superstep2.class)...;
>
> Therefore, when recovery, in BSPTask run() we can have
>
> List<BSPSuperstep> steps = BSPJob.supersteps();
>
> for(BSPSuperstep step: steps) {
>   if(checkpointed) {
>     // restore checkpointed messages e.g. adding checkpointed msg (in
hdfs) back to queues
>   }
>   step.superstep(...);
>   step.sync();
> }
>
> The advantage is easier for recovery procedure.
> The disadvantage may be the client programme need to explicitly tell the
order of superstep.
>
> Any thought?
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan



--
Thomas Jungblut
Berlin

mobile: 0170-3081070

business: thomas.jungblut@testberichte.de
private: thomas.jungblut@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message