incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin" <chl...@nuk.edu.tw>
Subject [Discussion] Refactor bsp() for recovery procedure
Date Mon, 19 Sep 2011 03:34:29 GMT
Currently we have bsp() where users can code for performing thier tasks. For instance, 

... bsp() ...{
   ... // some computation
   sync();
   ... // some other computation
   sync();
   ...   
}

However, this is difficult for recovery because 1st, it requires checkpointed messages to
be recovered so that the computation can be resumed from where it fails; 2nd, the recovery
procedure needs to know from which super step to restart. With the current bsp(), it seems
a common choice is preprocessing; but this may not be good because when internally something
goes wrong it, it is not easy to find out the problem. 

I come up with an alternative method but this would have change to the way of our current
procedure. So I think it would be good to discuss it first. It is proposed as below:

1. we divide bsp() into smaller computation unit called e.g. step() or superstep(), within
which user still write their own logic. 

2. in main, user composes the order of supersteps. 

... class Superstep1 extends BSPSuperstep {
   ... superstep() ... {...}
}
... class Superstep2 extends BSPSuperstep {
   ... superstep() ... {...}
}

BSPJob bsp = new BSP(...);
bsp.compose(Superstep1.class).compose(Superstep2.class)...;

Therefore, when recovery, in BSPTask run() we can have 

List<BSPSuperstep> steps = BSPJob.supersteps();

for(BSPSuperstep step: steps) {
   if(checkpointed) { 
     // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) back to queues
   }
   step.superstep(...);
   step.sync();
}

The advantage is easier for recovery procedure.
The disadvantage may be the client programme need to explicitly tell the order of superstep.
 

Any thought?

--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Mime
View raw message