hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "PageRank" by thomasjungblut
Date Wed, 12 Sep 2012 12:47:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "PageRank" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/PageRank?action=diff&rev1=9&rev2=10

  
   * Uses the PageRank algorithm described in the Google Pregel paper
   * Introduces partitioning and collective communication
-  * Lets the user submit his/her own TextFile to calculate the sites' Pagerank!
  
  == Usage ==
  
  {{{
- bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path>
[damping factor] [epsilon error] [tasks]
+ bin/hama jar ../hama-0.x.0-examples.jar pagerank <input path> <output path>
[damping factor] [epsilon error] [tasks]
  }}}
  
  The default parameters for pagerank are:
@@ -39, +38 @@

  
  Make sure that every site's outlink can somewhere be found in the file as a key-site. Otherwise
it will result in weird NullPointerExceptions.
  
- Now you need to transform the text file using:
- {{{
- bin/hama jar ../hama-0.4.0-examples.jar pagerank-text2seq /tmp/input.txt /tmp/out/
- }}}
- 
  Then you can run pagerank on it with:
  
  {{{
- bin/hama jar ../hama-0.4.0-examples.jar pagerank /tmp/out /tmp/pagerank-output
+ bin/hama jar ../hama-0.x.0-examples.jar pagerank /tmp/input/input.txt /tmp/pagerank-output
  }}}
  
  Note that based on what you have configured, the paths may be in HDFS or on local disk.
@@ -59, +53 @@

  All pages' rank should sum up to 1.0, otherwise the algorithm is broken.
  
  
- == Sample Adjacencylist File ==
- 
- You can create a large pagerank input file by using the PagerankTeragen file from here:
http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/src/de/jungblut/hama/util/PagerankTeragen.java
- 
- It is based on MapReduce and requires a running Hadoop cluster. You can create a file using
- 
- {{{
- hadoop/bin hadoop -jar <jar containing the pagerank teragen> <number of vertices>
<number of reducers / output files> <number of edges per vertex> <output path>
- }}}
- 
- Have fun! If you are facing problems, feel free to ask questions on the official mailing
list.
- 
- 
  == Implementation ==
  
  For detailed questions in terms of implementation have a look at my blog.
- It describes the algorithm and focuses on the main ideas showing implementation things.
+ It describes the algorithm and focuses on the main ideas showing implementation things.

+ It contains ancient code from before Hama 0.5 where we introduced the graph API.
  
  http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
  

Mime
View raw message