I have a simple Page-rank algorithm for general purpose graphs implemented
using Python/Hadoop streaming.
It uses the simple power method. The Map-reduce algorithm is described in
http://static.last.fm/johan/huguk-20090414/paolo_castagna-pagerank.pdf.
One difference -- the transition probabilities along the edges are
non-uniform in my implementation.
For what's it worth, at the end of the ranking process, the code generates a
visualization of the network graph with the page-ranks for the vertices.
This file can be viewed using GUESS (http://graphexploration.cond.org/).
(Obviously for webscale datasets, this visualization is worthless).
I was planning on porting my code to Mahout as a good way of learning more
about Mahout.
However, if Ken is going to contribute this code, and the code is going to
be more scalable, then I can look at implementing something else -- perhaps
TextRank, SimRank...
Let me know,
- Manish
On Thu, Jul 1, 2010 at 9:24 AM, Ken Krugler <kkrugler_lists@transpac.com>wrote:
>
> On Jul 1, 2010, at 8:16am, Andrzej Bialecki wrote:
>
> On 2010-06-30 21:11, Grant Ingersoll wrote:
>>
>>>
>>> On Jun 27, 2010, at 12:10 PM, Manish Katyal wrote:
>>>
>>> Is there an implementation of the page-rank algorithm in Mahout?
>>>>
>>>
>>> No, there isn't. However, do you mean to implement one specifically for
>>> link analysis or a general purpose one?
>>>
>>
>> There is one in Nutch, but it's tied to the Nutch API.
>>
>
> It's likely we'll be contributing one to Mahout - either based on Jimmy
> Lin's enhancements as described during Hadoop Summit on Tuesday, or we might
> try the "do it all with SVD" approach as previously proposed by Ted, and
> mentioned by Jake.
>
> -- Ken
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c w e b m i n i n g
>
>
>
>
>
|