hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Partitioning
Date Wed, 05 Dec 2012 11:05:57 GMT
Because the property is called "local". This doesn't affect the distributed
mode.
Note that it is really bad if you compute multiple tasks on different host
machines, because this leverages your communication costs.

2012/12/5 Benedikt Elser <elser@disi.unitn.it>

> Thank you, I will try that. However if I set bsp.local.tasks.maximum to 1,
> why doesn't it distribute one task to each machine?
>
> On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote:
>
> > So it will spawn 12 tasks. If this doesn't satisfy the load on your
> > machines, try to use smaller blocksizes.
> >
> > 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
> >
> >> Hi,
> >>
> >> thanks for your reply!
> >>
> >> Total size:    49078776 B
> >> Total dirs:    1
> >> Total files:   12
> >> Total blocks (validated):      12 (avg. block size 4089898 B)
> >>
> >> Benedikt
> >>
> >> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:
> >>
> >>> So how many blocks has your data in HDFS?
> >>>
> >>> 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
> >>>
> >>>> Hi List,
> >>>>
> >>>> I am using the hama-0.6.0 release to run graph jobs on various input
> >>>> graphs in a ec2 based cluster of size 12. However as I see in the logs
> >> not
> >>>> every node on the cluster contributes to that job (they have no
> >>>> tasklog/job<ID> dir and are idle). Theoretically a distribution
of 1
> >>>> Million nodes across 12 buckets should hit every node at least once.
> >>>> Therefore I think its a configuration problem. So far I messed around
> >> with
> >>>> these settings:
> >>>>
> >>>>  <name>bsp.max.tasks.per.job</name>
> >>>>  <name>bsp.local.tasks.maximum</name>
> >>>>  <name>bsp.tasks.maximum</name>
> >>>>  <name>bsp.child.java.opts</name>
> >>>>
> >>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to
> 12
> >>>> hat not the desired effect. I also split the input into 12 files
> >> (because
> >>>> of something in 0.5, that was fixed in 0.6).
> >>>>
> >>>> Could you recommend me some settings or guide me through the system's
> >>>> partition decision? I thought it would be:
> >>>>
> >>>> Input -> Input Split based on input, max* conf values -> number
of
> tasks
> >>>> HashPartition.class distributes Ids across that number of tasks.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Benedikt
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message