Return-Path: X-Original-To: apmail-hama-commits-archive@www.apache.org Delivered-To: apmail-hama-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1573EE443 for ; Thu, 24 Jan 2013 05:40:05 +0000 (UTC) Received: (qmail 68964 invoked by uid 500); 24 Jan 2013 05:40:04 -0000 Delivered-To: apmail-hama-commits-archive@hama.apache.org Received: (qmail 68894 invoked by uid 500); 24 Jan 2013 05:40:03 -0000 Mailing-List: contact commits-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list commits@hama.apache.org Received: (qmail 68865 invoked by uid 500); 24 Jan 2013 05:40:02 -0000 Delivered-To: apmail-incubator-hama-commits@incubator.apache.org Received: (qmail 68848 invoked by uid 99); 24 Jan 2013 05:40:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jan 2013 05:40:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jan 2013 05:39:59 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 5F0BF23E for ; Thu, 24 Jan 2013 05:39:38 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Thu, 24 Jan 2013 05:39:38 -0000 Message-ID: <20130124053938.55578.92065@eos.apache.org> Subject: =?utf-8?q?=5BHama_Wiki=5D_Update_of_=22PageRank=22_by_edwardyoon?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for chan= ge notification. The "PageRank" page has been changed by edwardyoon: http://wiki.apache.org/hama/PageRank?action=3Ddiff&rev1=3D10&rev2=3D11 + This document assume that you have already installed Hama cluster and you= have tested it using some examples. + = =3D=3D PageRank =3D=3D = * Uses the PageRank algorithm described in the Google Pregel paper * Introduces partitioning and collective communication = - =3D=3D Usage =3D=3D + =3D=3D Run PageRank on Hama Cluster =3D=3D + = + First of all, generate a symmetric adjacency matrix using the gen command= . = = {{{ - bin/hama jar ../hama-0.x.0-examples.jar pagerank [damping factor] [epsilon error] [tasks] + % bin/hama jar hama-examples-0.x.0.jar gen symmetric 100 10 randomgraph= 2 }}} = - The default parameters for pagerank are: + This will create a graph with 100 nodes and 1K edges and store 2 partitio= ns on HDFS as the sequence file. You can adjust partition and tasks numbers= to fit your cluster. Then, run PageRank using: = {{{ - 0.85 0.001 + % bin/hama jar hama-examples-0.x.0.jar pagerank randomgraph pagerankres= ult 4 }}} = - As you can see 0.85 is the damping factor, that is the probability which = a user will "randomly" jump to other sides. See the [[http://en.wikipedia.o= rg/wiki/PageRank#The_intentional_surfer_model|Random Surfer Model]]. + =3D=3D Submit your own graph =3D=3D = - 0.001 is the convergence error, the error will always be measured after a= n iteration. It tells how much the pagerank of all sites has changed. If yo= u are setting this to a lower factor, it will take more iterations. = + See [[WriteHamaGraphFile]] = - =3D=3D Submit your own Web-graph =3D=3D - = - You can transform your graph as a adjacency list to fit into the input wh= ich Hama is going to parse and calculate the Pagerank. - = - The file that Hama can successfully parse is a TextFile that has the foll= owing layout: - = - {{{ - Site1\tSite2\tSite3 - Site2\tSite3 - Site3 - }}} - = - This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3= and Site3 is a dangling node. - As you can see a site is always on the leftmost side (we call it the key-= site), and the outlinks are seperated by tabs (\t) as the following element= s. - = - Make sure that every site's outlink can somewhere be found in the file as= a key-site. Otherwise it will result in weird NullPointerExceptions. - = - Then you can run pagerank on it with: - = - {{{ - bin/hama jar ../hama-0.x.0-examples.jar pagerank /tmp/input/input.txt /tm= p/pagerank-output - }}} - = - Note that based on what you have configured, the paths may be in HDFS or = on local disk. - = - =3D=3D Output =3D=3D - = - The output is a double value that is between zero and 1.0. Where 1.0 is a= very "famous" site. - = - All pages' rank should sum up to 1.0, otherwise the algorithm is broken. - = - = - =3D=3D Implementation =3D=3D - = - For detailed questions in terms of implementation have a look at my blog. - It describes the algorithm and focuses on the main ideas showing implemen= tation things. = - It contains ancient code from before Hama 0.5 where we introduced the gra= ph API. - = - http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.ht= ml -=20