hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Halim <felix.ha...@gmail.com>
Subject Re: Pregel article
Date Mon, 05 Jul 2010 11:23:59 GMT
On Mon, Jul 5, 2010 at 7:06 PM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> I agree with you that we should measure about the number of
> iterations. And, as you said, there is still I/O overhead involved in
> reading and writing materialized data every time, even if avoiding the
> situation shuffle and sort of reduce phase.

I'm particularly interested in how BSP handle the I/O overhead?
Suppose only several vertices are active among millions of vertices.
How does BSP activate those vertices?
Does the vertices directly accessible?
Usually the vertices are stored within a block of 64 MB.
Does BSP read all 64 MB just to activate one vertex?
Or BSP has some kind of indexing?

> IMO, BSP will communication only some vertices which can't be solved
> locally, and I'm sure that the number of iterations will be less or
> equal to (M/R based) Schimmy approach.

As far as I know, Schimmy approach doesn't reduce the number of iterations.
It only used to avoid shuffling the "master" graph.
So, the number of iterations for MR based and the number of supersteps
for BSP should be the same.
Here number of MR iterations (or rounds) is identical to BSP's "supersteps".

> More I hope we can compare them using Hama BSP soon.

I'm sure BSP version will be more efficient since BSP is like MR +
Schimmy built in.

Felix Halim

View raw message