hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Elser <el...@disi.unitn.it>
Subject Partitioning
Date Wed, 05 Dec 2012 10:43:46 GMT
Hi List,

I am using the hama-0.6.0 release to run graph jobs on various input graphs in a ec2 based
cluster of size 12. However as I see in the logs not every node on the cluster contributes
to that job (they have no tasklog/job<ID> dir and are idle). Theoretically a distribution
of 1 Million nodes across 12 buckets should hit every node at least once. Therefore I think
its a configuration problem. So far I messed around with these settings:

   <name>bsp.max.tasks.per.job</name>
   <name>bsp.local.tasks.maximum</name>
   <name>bsp.tasks.maximum</name>
   <name>bsp.child.java.opts</name>

Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12 hat not the desired
effect. I also split the input into 12 files (because of something in 0.5, that was fixed
in 0.6). 

Could you recommend me some settings or guide me through the system's partition decision?
I thought it would be:

Input -> Input Split based on input, max* conf values -> number of tasks
HashPartition.class distributes Ids across that number of tasks.

Thanks,

Benedikt
Mime
View raw message