hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "WriteHamaGraphFile" by thomasjungblut
Date Sun, 27 May 2012 20:04:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "WriteHamaGraphFile" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/WriteHamaGraphFile?action=diff&rev1=3&rev2=4

  
  For this example, the Wikipedia link dataset is used (http://haselgrove.id.au/wikipedia.htm)
/ (http://users.on.net/~henry/pagerank/links-simple-sorted.zip).
  
- The dataset contains 5,716,808 pages and 130,160,392 links and is unzipped ~1gb large. You
should use a smallish cluster to crunch this dataset with Hama, based on the blocksize of
HDFS a slot number of 8-32 is required. We tell you later how to fine tune this to use fewer
slots if you don't have them currently.
+ The dataset contains 5,716,808 pages and 130,160,392 links and is unzipped ~1gb large. You
should use a smallish cluster to crunch this dataset with Hama, based on the blocksize of
HDFS a slot number of 16-32 is required. 
  
  The file is formatted like this
  
@@ -218, +218 @@

  
  '''Troubleshooting'''
  
- If your job does not execute, your cluster may not have enough resources (task slots). 
+ If your job does not execute, your cluster may not have enough resources (task slots). 

- You can either increase them, or decrease the minimum split size by setting:
+ Symptoms may look like this in the bsp master log:
  {{{
-    pageJob.set("bsp.min.split.size", (512 * 1024 * 1024) + "");
+ 2012-05-27 20:00:51,228 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
+ 2012-05-27 20:00:51,288 INFO org.apache.hama.bsp.JobInProgress: num BSPTasks: 16
+ 2012-05-27 20:00:51,305 INFO org.apache.hama.bsp.JobInProgress: Job is initialized.
+ 2012-05-27 20:00:51,313 ERROR org.apache.hama.bsp.SimpleTaskScheduler: Scheduling of job
Pagerank could not be done successfully. Killing it!
+ 2012-05-27 20:01:08,334 INFO org.apache.hama.bsp.JobInProgress: num BSPTasks: 16
+ 2012-05-27 20:01:08,339 INFO org.apache.hama.bsp.JobInProgress: Job is initialized.
+ 2012-05-27 20:01:08,340 ERROR org.apache.hama.bsp.SimpleTaskScheduler: Scheduling of job
Pagerank could not be done successfully. Killing it!
  }}}
- This will set the split size to 512mb, thus having 2 tasks and not 32 or 16.
+ 
+ This was run on a 8 slot cluster, but it required 16 slots because of 64m chunk size of
HDFS.
+ Either you can reupload the file with higher chunksize so the slots match the blocks or
you can increase the slots in your Hama cluster.
  
  If you sort the result descending by pagerank you can see the following top 10 sites:
  

Mime
View raw message