hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Reset Input RecordReader
Date Tue, 29 Nov 2011 06:24:45 GMT
Yep, it is just a reopen. Let's call it like this. I'm going to make up a
patch later.
Therefore it is just the read of the same assigned split. So no problem ;)

Yes BSP is not atomic, but as long as the user sticks with the
communication and the stuff from IO (not using fields in a hashmap like
pagerank or so) this is always easy recoverable.
But you cannot express every algorithm with just one sync at the end of a
function, so BSP() must be somewhere anyways.
For me it is a question of algorithm design, as long as you use major parts
from our framework, this is fail safe.


2011/11/29 ChiaHung Lin <chl501@nuk.edu.tw>

> Do it mean for each iteration the computation (code within bsp function)
> requires to read the same or different input?
>
> I have this questions is because it seems to me having related to what
> previously I mentioned regarding to the rework of bsp function (providing a
> smaller computation unit e.g. superstep).
>
> bsp(...) {
> sync()
> // superstep 1
> // read from hdfs
> // compute1()
> // send messages ...
> sync()
> // superstep 2
> // read from/ write pvfs
> // compute2()
> sync()
> // superstep 3
> // write to cassandra
> // compute3()
> sync()
> ...
> }
>
> The reason is because within bsp() it consists of several supersteps. And
> for each iteration, users probably want to read from/ write to different
> input/ output. This is a pattern. Although current bsp() is flexible
> allowing users to write whatever they want within bsp(), the disadvantage I
> observe include 1.) difficult for recovery 2.) many code mixed up together
> within one function.
>
> The first one may be overcome by source code instrumentation but that is
> not a good solution because users do not know what/ where goes wrong when
> bsp() doesn't function well.
>
> The second one is a bit minor, and can be e.g. reorganized in a more
> modular way. But this looks similar to the way if we provide e.g
> superstep().
>
> Just some thoughts.
>
> -----Original message-----
> From:Thomas Jungblut <thomas.jungblut@googlemail.com>
> To:hama-dev@incubator.apache.org
> Date:Tue, 29 Nov 2011 04:39:38 +0100
> Subject:Reset Input RecordReader
>
> Hi all,
>
> I need some kind of reset-logic for the input of a BSP Job.
> It should be quite easy to add:
> - add a method called resetInput() in BSPPeer
> - in concrete implementation it just closes the input split and opens it
> again
>
> If you're interested why I need this, I'm currently writing a k-means
> clustering in BSP.
> I need to iterate over all vectors from the input and measure distance
> against a set of centers in each superstep, so it would help me to "reset"
> the input.
>
> Do you think I can add this right away into the trunk?
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message