giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Hutfles <>
Subject Re: Network flow problems in Giraph
Date Mon, 22 Apr 2013 04:30:41 GMT
That makes perfect sense, Claudio.  I was missing the need for the
algorithm to exclude shared global data, and how that's also required for
it to be vertex-centric...  I was naively thinking that it would be
feasible if the logic for graph alterations and messaging could be
determined locally, but there's obviously data behind the logic and
alterations.  That data must ALSO be local to the vertex.  Is that a fair

I think I'm having the most difficulty overcoming years of thinking
affected by Operations Research and Data Structures, where techniques like
breadth-first search are avoided for more complicated data structures or
techniques requiring tree pivots or network simplex methods...

To be honest, my first opinion of Pregel and Giraph was that they'd be
mainly useful for Coway's Life or Langdon's Ants.  Ever since, I've been
hoping to find a way to leverage it as a better OR/optimization tool.  Kind
of how actor-based concurrency really changed my outlook on parallel
programming, I'm hoping vertex-centric thinking changes how I think about
graph algorithms.

On a completely unrelated note, is there another term for "vertex-centric
thinking?"  I've been typing v-centric, and have gotten lazy and refer to
writing algorithms in "fifth person" (due to the V, and roman numerals...)

On Sat, Apr 20, 2013 at 12:42 PM, Claudio Martella <> wrote:

> One disclaimer first: not all iterative algorithms fit well to Giraph.
> Specifically to your example, Dijkstra's method for single source shortest
> path does not. The main general reason is that it does not scale in
> parallel, and specifically to Giraph because it requires centralised shared
> data. In fact, it makes more sense to run a less-efficient but massively
> parallelizable algorithm: Breadth-first search (BFS).
> Coming to maxflow problem, you'd have to look for one of the algorithms
> that is iterative and fits easily to a local vertex-centric view (where
> vertices exchange information through messages). Quite honestly, I haven't
> had the need/chance to look at a maxflow problem till now, so I don't have
> a solution for you right now. After a quick look, it appears to me that
> *could* be a
> good starting point.
> On Thu, Apr 18, 2013 at 4:41 PM, Jay Hutfles <> wrote:
>> Actually, I think a max flow problem fits exactly with the batch
>> processing model you describe.  You are given a massive graph (with
>> predefined maximum flows along the edges between vertices), you run a
>> program, it produces an output (i.e. the flow along each edge) and it
>> terminates.  It's not necessarily adapting to any new inputs as it runs.
>>  But it is an iterative process.  Or, at least, many algorithms are.
>> I don't see it as being that different from Djikstra's Method for finding
>> the shortest distance between two nodes on a graph.  Each super step is
>> updating the labels along the graph, and when all notes are labelled as
>> done, the algorithm finishes.  A max flow problem could be implemented
>> likewise, since there are labeling algorithms for determining the max flow
>> along a graph.
>> See, specifically the
>> Solutions section.  I think it should help clarify.
>> On Thu, Apr 18, 2013 at 9:16 AM, André Kelpe <
>>> wrote:
>>> Hi Jay,
>>> this sounds like a continuous operation to me. Giraph is meant for
>>> batch processing of massive graphs, which produces an output after a
>>> successful run. You run a program, it produces an output, it
>>> terminates. From what I understand, a stream processing framework like
>>> storm ( could be a
>>> better fit for this. Please let me know, if I am missing something.
>>> André
>>> 2013/4/18 Jay Hutfles <>:
>>> > I'm new to Giraph, but am interested in its applications to classic
>>> network
>>> > flow problems, specifically max flow or min cost problems.  I've
>>> looked for
>>> > BSP implementations of algorithms for these problems, but I can't find
>>> any
>>> > discussion regarding this online.  Has anyone had luck implementing
>>> such
>>> > problems?
>>> >
>>> >
>>> > The max flow problem seems like it should be adaptable to the BSP
>>> model.
>>> > The flow augmenting algorithm developed by Ford and Fulkerson is
>>> > essentially:
>>> >
>>> > while the graph contains a path over which flow could be increased,
>>> >    increase flow for arcs on the path
>>> >
>>> > Identifying the flow augmenting paths is a simple labeling algorithm,
>>> but
>>> > I'm not sure how I'd implement the "while the graph contains ..."
>>> condition.
>>> > Is that a super step above the labeling algorithm's super steps?
>>> >
>>> >
>>> > And I have no idea how to start the min cost algorithm.  Anyone have
>>> ideas
>>> > for how to formulate this?
>>> >
>>> > Thanks for your time, and for the great work on Giraph!
> --
>    Claudio Martella

View raw message