hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin" <chl...@nuk.edu.tw>
Subject Re: Reset Input RecordReader
Date Tue, 29 Nov 2011 07:22:13 GMT
Slightly disagree with easy recoverable part. Considering the following code snippet

bsp() {
var i,j,k;
compute1()
sync()
compute2()
sync()
  for(...) {
    computex(i, k)
    sync()
    computey(j)
    sync()
  }// for
}

Suppose it has 43 supersteps. And it has checkpointed data at the 23th superstep, then bsp
task crashes. So steps to recover may include 1.) analyze source to ensure the number of sync()
reaching to the superstep 23th. 2.) main thread need to find a way going to that function
and feeding the checkpoint data and maybe also ensure it does not violate some atomicity with
variables some where else. 

The reason why I think it might be easier for recovery with a bit fine grained unit is because
we can achieve by feeding checkpointed messages back to a superstep directly (as below). (Of
course this is not the only way, we can discuss and probably find out a better solution)

// in framework
Superstep step ...;
if(recovered) {
  step = supersteps.get(22)
  step.recover(checkpointedData)
}
...


superstep() {
  if(recovered) {
    ... getCheckpointedMessage()
    // do something
  }
}

For sync(), it is not necessary to separate sync() from superstep, so we can have functions
allowing users to specify e.g. syncBefore(), syncAfter(), etc. when a superstep is called.


-----Original message-----
From:Thomas Jungblut <thomas.jungblut@googlemail.com>
To:hama-dev@incubator.apache.org
Date:Tue, 29 Nov 2011 07:24:45 +0100
Subject:Re: Reset Input RecordReader

Yep, it is just a reopen. Let's call it like this. I'm going to make up a
patch later.
Therefore it is just the read of the same assigned split. So no problem ;)

Yes BSP is not atomic, but as long as the user sticks with the
communication and the stuff from IO (not using fields in a hashmap like
pagerank or so) this is always easy recoverable.
But you cannot express every algorithm with just one sync at the end of a
function, so BSP() must be somewhere anyways.
For me it is a question of algorithm design, as long as you use major parts
from our framework, this is fail safe.


2011/11/29 ChiaHung Lin <chl501@nuk.edu.tw>

> Do it mean for each iteration the computation (code within bsp function)
> requires to read the same or different input?
>
> I have this questions is because it seems to me having related to what
> previously I mentioned regarding to the rework of bsp function (providing a
> smaller computation unit e.g. superstep).
>
> bsp(...) {
> sync()
> // superstep 1
> // read from hdfs
> // compute1()
> // send messages ...
> sync()
> // superstep 2
> // read from/ write pvfs
> // compute2()
> sync()
> // superstep 3
> // write to cassandra
> // compute3()
> sync()
> ...
> }
>
> The reason is because within bsp() it consists of several supersteps. And
> for each iteration, users probably want to read from/ write to different
> input/ output. This is a pattern. Although current bsp() is flexible
> allowing users to write whatever they want within bsp(), the disadvantage I
> observe include 1.) difficult for recovery 2.) many code mixed up together
> within one function.
>
> The first one may be overcome by source code instrumentation but that is
> not a good solution because users do not know what/ where goes wrong when
> bsp() doesn't function well.
>
> The second one is a bit minor, and can be e.g. reorganized in a more
> modular way. But this looks similar to the way if we provide e.g
> superstep().
>
> Just some thoughts.
>
> -----Original message-----
> From:Thomas Jungblut <thomas.jungblut@googlemail.com>
> To:hama-dev@incubator.apache.org
> Date:Tue, 29 Nov 2011 04:39:38 +0100
> Subject:Reset Input RecordReader
>
> Hi all,
>
> I need some kind of reset-logic for the input of a BSP Job.
> It should be quite easy to add:
> - add a method called resetInput() in BSPPeer
> - in concrete implementation it just closes the input split and opens it
> again
>
> If you're interested why I need this, I'm currently writing a k-means
> clustering in BSP.
> I need to iterate over all vectors from the input and measure distance
> against a set of centers in each superstep, so it would help me to "reset"
> the input.
>
> Do you think I can add this right away into the trunk?
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Mime
View raw message