hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Halim <felix.ha...@gmail.com>
Subject Re: More documentation on Hama- BSP
Date Wed, 03 Feb 2010 14:08:25 GMT
Of course the second problem is how to partition the graph so that the
sync is minimized...
Because as far as I know, when the data has been partitioned to some machine,
it never moves from those machine (other machines get them through
sync), am I right?.
Or maybe the data may be moved among machines if it leads to lesser
number of sync?

Anyway, I'm currently researching on running max-flow algorithm on top
of MapReduce. So I was looking whether BSP is a good add on. It turns
out that Hama's BSP is quite different from MapReduce. Not exactly in
form of map() and reduce() function anymore...

I was thinking BSP as an add on to MapReduce: In current MR, the map
function is supplied one <key,value> at a time. If the map() function
is modified so that it takes in a RANGE of <key,value> pairs of
certain range, then in some sense it becomes BSP (as explained in my
comment here: http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html#comments).
 However, Hama's BSP seemed to take a different approach...

Felix Halim

On Wed, Feb 3, 2010 at 7:08 PM, Hyunsik Choi <hyunsik.choi@gmail.com> wrote:
> Hi Felix Halim,
> Such terrible global sync problem may only occurs in a few of problems
> that have global states. We are trying to research better ways.
> However, many problems (e.g., finding isomorphic subgraphs, computing
> clustering coefficient, and finding cohesive groups) that have global
> states don't have such global sync problems. The graph package of Hama
> is appropriate to these problems.
> We will think about PathOthersHaveFollowed(). Thank you for your
> advice! If you are interested in this project, why don't you
> participate in our project?
> Best regards,
> --
> Hyunsik Choi
> Database & Information Systems Group, Korea Univ.
> http://diveintodata.org
> On Wed, Feb 3, 2010 at 5:54 PM, Felix Halim <felix.halim@gmail.com> wrote:
>> On Wed, Feb 3, 2010 at 4:41 PM, Edward J. Yoon <edwardyoon@apache.org> wrote:
>> >>> I think the needToVisit() function might as well need to communicate
>> >>> with other machine:
>> >
>> > Hmm, You're exactly right. In that example, needToVisit() function
>> > checks the IsVisited from some shared-space (e.g., HBase or DBMS, ...,
>> > etc). We wrote with intent to simplify it.
>> From the pseudocode, I see that every Vertex will request IsVisited.
>> In a large graph, the HBase or DBMS will be overwhelmed by many tiny
>> requests from each Vertex.
>> Does the needToVisit() has "bulk query" that aggregates the tiny
>> requests into a single request?
>> > Integer is the distance at "Map<Vertex, Integer> input, Map<Vertex,
>> > Integer> nextQueue", but it could be replaced as other object. for
>> > example, new PathWeHaveFollowed(). Then, perhaps we need not some
>> > shared-space.
>> But we also need to know PathOthersHaveFollowed() don't we?
>> That's why we need HBase to store the global "visited" states of each node.
>> Felix Halim

View raw message