giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Kelpe <>
Subject Re: All pairs shortest paths
Date Wed, 26 Sep 2012 19:14:45 GMT
Hi Venkata!

2012/9/25 Venkata Sastry Malladi <>:
> Thanks for the reply, Sebastian.
> The problem I'm working on is a routing algorithm for a country wide railway
> network. I was thinking about two ways to approach it.

How many vertices and edges do you have? Seems of reasonable size...

> 1. Pre calculate the routes. I understand that this is quadratic, but I was
> thinking of using some optimizations to limit the calculations. Like
> restrict the number of hops from the source node, only calculate the route
> to the nearest hub and not every other node, etc.

If this is really just the railroad network, that should be fine. I
have done something similar at my previous job with the road network
on country basis and it took only a few minutes to actually run the
thing on a small-ish cluster. Those networks are usually not billions
of vertices, so no problem as n^2 is still pretty small.

The giraph job we made, was part of a bigger pipeline of jobs (we used
crunch for that btw). The first job would cut out the bare minimum of
data (the graph minus any other attribution), which was the input for
giraph. Then a giraph job read that input created an output file,
which was used as an index by the final algorithm. (We actually
calculated all the routes from all the vertices twice (in both
directions) and filtered in the output writer and that was still fast
enough for us.)

You can do something similar, where you basically use giraph to
calculate your graph index and then stick that into some database
(hbase, redis,  mysql whatever works for your dataset) and query that
from your program.

> 2. Use Giraph as a real time query engine - The graph is loaded into the
> memory of a cluster of nodes, and given the source and destination nodes of
> a query, a single source shortest path algorithm is run.

It is not made for real-time queries. As I said, just use giraph to
make your index and keep that in memory with the db of your choice.



View raw message