Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
User-Agent: Microsoft-Entourage/11.3.3.061214
Date: Fri, 01 Feb 2008 15:48:59 -0800
Subject: Re: graph data representation for mapreduce
From: Ted Dunning <tdunning@veoh.com>
To: <core-user@hadoop.apache.org>
Message-ID: <C3C8EFEB.37727%tdunning@veoh.com>
Thread-Topic: graph data representation for mapreduce
Thread-Index: AchlF6IoH7VH1MaPQWCHxZ5yK4It9wACyXRwAAKO7oY=
In-Reply-To: 
 <FCDAD498A048A6428B2255D5D4587D3701E72EA4@sf2pmxb01.thefacebook.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit


In my work, I am finding that sending around entire rows or columns of the
adjacency graph gives substantial gains as does block decomposition of some
of the algorithms involved.


On 2/1/08 2:51 PM, "Joydeep Sen Sarma" <jssarma@facebook.com> wrote:

> some of our biggest map reduce jobs have been graph related (not shortest path
> though).
> 
> map-reduce doesn't seem like the best computation platform for some of the
> jobs we have had to do. Even a huge graph does not require unheard amounts of
> memory to store as an adjacency list. but mapping (at least some) graph
> algorithms to map-reduces causes intermediate data to bloat to enormous sizes.
> 
> to that end we are moving away from pure map-reduce to hybrid models that work
> in tandem with large memory banks caching the graph. The trend towards cheap
> flash storage is a helpful factor (one that we haven't exploited yet though).
> 
> 
> 
> -----Original Message-----
> From: Peter W. [mailto:peter@marketingbrokers.com]
> Sent: Fri 2/1/2008 1:14 PM
> To: core-user@hadoop.apache.org
> Subject: Re: graph data representation for mapreduce
>  
> Cam,
> 
> Making a directed graph in Hadoop is not
> very difficult but traversing live might be
> since the result is a separate file.
> 
> Basically, you kick out a destination node
> as your key in the mapper and from nodes as
> intermediate values. Concatenate from values in
> the reducer assigning weights for each edge.
> 
> Assigned edge scores come from a computation
> done in the reducer or number passed by key.
> 
> This gives a simple but weighted from/to
> depiction and can be experimented with and
> improved by subsequent passes or REST style
> calls in the mapper for mysqldb weights.
> 
> Later,
> 
> Peter W.
> 
> Cam Bazz wrote:
> 
>> Hello,
>> 
>> I have been long interested in storing graphs, in databases, object
>> databases and lucene like indexes.
>> ....
>> 
>> Has anyone done any work on storing and processing graphs with map
>> reduce?
>> If I were to start, where would I start from. I am interested in
>> finding
>> shortest paths in a large graph.
> 
>