Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 92567 invoked from network); 1 Feb 2008 23:49:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Feb 2008 23:49:47 -0000 Received: (qmail 33266 invoked by uid 500); 1 Feb 2008 23:49:36 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 33238 invoked by uid 500); 1 Feb 2008 23:49:36 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 33229 invoked by uid 99); 1 Feb 2008 23:49:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2008 15:49:36 -0800 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2008 23:49:21 +0000 Received: from 206.169.1.36 ([206.169.1.36]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Fri, 1 Feb 2008 23:49:10 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Fri, 01 Feb 2008 15:48:59 -0800 Subject: Re: graph data representation for mapreduce From: Ted Dunning To: Message-ID: Thread-Topic: graph data representation for mapreduce Thread-Index: AchlF6IoH7VH1MaPQWCHxZ5yK4It9wACyXRwAAKO7oY= In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org In my work, I am finding that sending around entire rows or columns of the adjacency graph gives substantial gains as does block decomposition of some of the algorithms involved. On 2/1/08 2:51 PM, "Joydeep Sen Sarma" wrote: > some of our biggest map reduce jobs have been graph related (not shortest path > though). > > map-reduce doesn't seem like the best computation platform for some of the > jobs we have had to do. Even a huge graph does not require unheard amounts of > memory to store as an adjacency list. but mapping (at least some) graph > algorithms to map-reduces causes intermediate data to bloat to enormous sizes. > > to that end we are moving away from pure map-reduce to hybrid models that work > in tandem with large memory banks caching the graph. The trend towards cheap > flash storage is a helpful factor (one that we haven't exploited yet though). > > > > -----Original Message----- > From: Peter W. [mailto:peter@marketingbrokers.com] > Sent: Fri 2/1/2008 1:14 PM > To: core-user@hadoop.apache.org > Subject: Re: graph data representation for mapreduce > > Cam, > > Making a directed graph in Hadoop is not > very difficult but traversing live might be > since the result is a separate file. > > Basically, you kick out a destination node > as your key in the mapper and from nodes as > intermediate values. Concatenate from values in > the reducer assigning weights for each edge. > > Assigned edge scores come from a computation > done in the reducer or number passed by key. > > This gives a simple but weighted from/to > depiction and can be experimented with and > improved by subsequent passes or REST style > calls in the mapper for mysqldb weights. > > Later, > > Peter W. > > Cam Bazz wrote: > >> Hello, >> >> I have been long interested in storing graphs, in databases, object >> databases and lucene like indexes. >> .... >> >> Has anyone done any work on storing and processing graphs with map >> reduce? >> If I were to start, where would I start from. I am interested in >> finding >> shortest paths in a large graph. > >