hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: graph data representation for mapreduce
Date Fri, 01 Feb 2008 23:48:59 GMT

In my work, I am finding that sending around entire rows or columns of the
adjacency graph gives substantial gains as does block decomposition of some
of the algorithms involved.


On 2/1/08 2:51 PM, "Joydeep Sen Sarma" <jssarma@facebook.com> wrote:

> some of our biggest map reduce jobs have been graph related (not shortest path
> though).
> 
> map-reduce doesn't seem like the best computation platform for some of the
> jobs we have had to do. Even a huge graph does not require unheard amounts of
> memory to store as an adjacency list. but mapping (at least some) graph
> algorithms to map-reduces causes intermediate data to bloat to enormous sizes.
> 
> to that end we are moving away from pure map-reduce to hybrid models that work
> in tandem with large memory banks caching the graph. The trend towards cheap
> flash storage is a helpful factor (one that we haven't exploited yet though).
> 
> 
> 
> -----Original Message-----
> From: Peter W. [mailto:peter@marketingbrokers.com]
> Sent: Fri 2/1/2008 1:14 PM
> To: core-user@hadoop.apache.org
> Subject: Re: graph data representation for mapreduce
>  
> Cam,
> 
> Making a directed graph in Hadoop is not
> very difficult but traversing live might be
> since the result is a separate file.
> 
> Basically, you kick out a destination node
> as your key in the mapper and from nodes as
> intermediate values. Concatenate from values in
> the reducer assigning weights for each edge.
> 
> Assigned edge scores come from a computation
> done in the reducer or number passed by key.
> 
> This gives a simple but weighted from/to
> depiction and can be experimented with and
> improved by subsequent passes or REST style
> calls in the mapper for mysqldb weights.
> 
> Later,
> 
> Peter W.
> 
> Cam Bazz wrote:
> 
>> Hello,
>> 
>> I have been long interested in storing graphs, in databases, object
>> databases and lucene like indexes.
>> ....
>> 
>> Has anyone done any work on storing and processing graphs with map
>> reduce?
>> If I were to start, where would I start from. I am interested in
>> finding
>> shortest paths in a large graph.
> 
> 


Mime
View raw message