hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Elser <el...@disi.unitn.it>
Subject Re: Partitioning
Date Wed, 05 Dec 2012 10:55:33 GMT
Hi,

thanks for your reply!

 Total size:    49078776 B
 Total dirs:    1
 Total files:   12
 Total blocks (validated):      12 (avg. block size 4089898 B)

Benedikt

On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:

> So how many blocks has your data in HDFS?
> 
> 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
> 
>> Hi List,
>> 
>> I am using the hama-0.6.0 release to run graph jobs on various input
>> graphs in a ec2 based cluster of size 12. However as I see in the logs not
>> every node on the cluster contributes to that job (they have no
>> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1
>> Million nodes across 12 buckets should hit every node at least once.
>> Therefore I think its a configuration problem. So far I messed around with
>> these settings:
>> 
>>   <name>bsp.max.tasks.per.job</name>
>>   <name>bsp.local.tasks.maximum</name>
>>   <name>bsp.tasks.maximum</name>
>>   <name>bsp.child.java.opts</name>
>> 
>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12
>> hat not the desired effect. I also split the input into 12 files (because
>> of something in 0.5, that was fixed in 0.6).
>> 
>> Could you recommend me some settings or guide me through the system's
>> partition decision? I thought it would be:
>> 
>> Input -> Input Split based on input, max* conf values -> number of tasks
>> HashPartition.class distributes Ids across that number of tasks.
>> 
>> Thanks,
>> 
>> Benedikt


Mime
View raw message