incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin" <chl...@nuk.edu.tw>
Subject Re: [Discussion] Refactor bsp() for recovery procedure
Date Wed, 21 Sep 2011 09:34:51 GMT
I will try to provide a patch so that we can have a baseline for discussion. 

-----Original message-----
From:Edward J. Yoon <edwardyoon@apache.org>
To:hama-dev@incubator.apache.org,chl501@nuk.edu.tw
Date:Tue, 20 Sep 2011 15:49:20 +0900
Subject:Re: [Discussion] Refactor bsp() for recovery procedure

> The disadvantage may be the client programme need to explicitly tell the order of superstep.

If user want to call a sync() method repeatedly in the loops while or
until a condition is true, how to program it?

bsp() {

  while (condition is true) {
    doLocalComputation();
    communicationWith(others);
    sync();
  }

}

I think, current BSP programming interface is very good. If it's just
only for recovery, we have to find another way.

2011/9/19 ChiaHung Lin <chl501@nuk.edu.tw>:
> Currently we have bsp() where users can code for performing thier tasks. For instance,
>
> ... bsp() ...{
>   ... // some computation
>   sync();
>   ... // some other computation
>   sync();
>   ...
> }
>
> However, this is difficult for recovery because 1st, it requires checkpointed messages
to be recovered so that the computation can be resumed from where it fails; 2nd, the recovery
procedure needs to know from which super step to restart. With the current bsp(), it seems
a common choice is preprocessing; but this may not be good because when internally something
goes wrong it, it is not easy to find out the problem.
>
> I come up with an alternative method but this would have change to the way of our current
procedure. So I think it would be good to discuss it first. It is proposed as below:
>
> 1. we divide bsp() into smaller computation unit called e.g. step() or superstep(), within
which user still write their own logic.
>
> 2. in main, user composes the order of supersteps.
>
> ... class Superstep1 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
> ... class Superstep2 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
>
> BSPJob bsp = new BSP(...);
> bsp.compose(Superstep1.class).compose(Superstep2.class)...;
>
> Therefore, when recovery, in BSPTask run() we can have
>
> List<BSPSuperstep> steps = BSPJob.supersteps();
>
> for(BSPSuperstep step: steps) {
>   if(checkpointed) {
>     // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) back to
queues
>   }
>   step.superstep(...);
>   step.sync();
> }
>
> The advantage is easier for recovery procedure.
> The disadvantage may be the client programme need to explicitly tell the order of superstep.
>
> Any thought?
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Mime
View raw message