mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Katyal <manish.kat...@gmail.com>
Subject Re: page rank algorithm?
Date Thu, 01 Jul 2010 19:16:25 GMT
I have a simple Page-rank algorithm for general purpose graphs implemented
using Python/Hadoop streaming.
It uses the simple power method. The Map-reduce algorithm is described in
http://static.last.fm/johan/huguk-20090414/paolo_castagna-pagerank.pdf.
One difference -- the transition probabilities along the edges are
non-uniform in my implementation.
For what's it worth, at the end of the ranking process, the code generates a
visualization of the network graph with the page-ranks for the vertices.
This file can be viewed using GUESS (http://graphexploration.cond.org/).
(Obviously for webscale datasets, this visualization is worthless).

I was planning on porting my code to Mahout as a good way of learning more
about Mahout.

However, if Ken is going to contribute this code, and the code is going to
be more scalable, then I can look at implementing something else -- perhaps
TextRank, SimRank...

Let me know,

- Manish


On Thu, Jul 1, 2010 at 9:24 AM, Ken Krugler <kkrugler_lists@transpac.com>wrote:

>
> On Jul 1, 2010, at 8:16am, Andrzej Bialecki wrote:
>
>  On 2010-06-30 21:11, Grant Ingersoll wrote:
>>
>>>
>>> On Jun 27, 2010, at 12:10 PM, Manish Katyal wrote:
>>>
>>>  Is there an implementation of the page-rank algorithm in Mahout?
>>>>
>>>
>>> No, there isn't.  However, do you mean to implement one specifically for
>>> link analysis or a general purpose one?
>>>
>>
>> There is one in Nutch, but it's tied to the Nutch API.
>>
>
> It's likely we'll be contributing one to Mahout - either based on Jimmy
> Lin's enhancements as described during Hadoop Summit on Tuesday, or we might
> try the "do it all with SVD" approach as previously proposed by Ted, and
> mentioned by Jake.
>
> -- Ken
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message