hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Partitioning
Date Wed, 05 Dec 2012 12:56:37 GMT
Exactly,

maybe you want to first read up all the different modes and how they are
configured:

http://wiki.apache.org/hama/GettingStarted#Modes

We also have some nice documentations as PDF which you can get here:

http://wiki.apache.org/hama/GettingStarted#Hama_0.6.0

The configuration property to change the number of tasks on every host is
"bsp.tasks.maximum" which is described by ">The maximum number of BSP tasks
that will be run simultaneously by a groom server.".

Setting this to 1 on every host where a groom server starts, and afterwards
restarting your cluster should do what you want to archieve.
I can recommend puppet for maintaining these kinds of configurations.

If you need a more formal complexity model for BSP applications let me
know, I have derived one from Rob Bisseling's BSP model that fits better to
Apache Hama's style of computation.



2012/12/5 Benedikt Elser <elser@disi.unitn.it>

> Ah, local mode, Bingo!
>
> About the communication costs: Yes I am aware of these, however this is
> exactly what I want to test in the first place :) Hence I would need a
> bsp.distributed.tasks.maximum
>
> Thanks for the clarifications,
>
> Benedikt
>
> On Dec 5, 2012, at 12:05 PM, Thomas Jungblut wrote:
>
> > Because the property is called "local". This doesn't affect the
> distributed
> > mode.
> > Note that it is really bad if you compute multiple tasks on different
> host
> > machines, because this leverages your communication costs.
> >
> > 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
> >
> >> Thank you, I will try that. However if I set bsp.local.tasks.maximum to
> 1,
> >> why doesn't it distribute one task to each machine?
> >>
> >> On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote:
> >>
> >>> So it will spawn 12 tasks. If this doesn't satisfy the load on your
> >>> machines, try to use smaller blocksizes.
> >>>
> >>> 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
> >>>
> >>>> Hi,
> >>>>
> >>>> thanks for your reply!
> >>>>
> >>>> Total size:    49078776 B
> >>>> Total dirs:    1
> >>>> Total files:   12
> >>>> Total blocks (validated):      12 (avg. block size 4089898 B)
> >>>>
> >>>> Benedikt
> >>>>
> >>>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:
> >>>>
> >>>>> So how many blocks has your data in HDFS?
> >>>>>
> >>>>> 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
> >>>>>
> >>>>>> Hi List,
> >>>>>>
> >>>>>> I am using the hama-0.6.0 release to run graph jobs on various
input
> >>>>>> graphs in a ec2 based cluster of size 12. However as I see in
the
> logs
> >>>> not
> >>>>>> every node on the cluster contributes to that job (they have
no
> >>>>>> tasklog/job<ID> dir and are idle). Theoretically a distribution
of 1
> >>>>>> Million nodes across 12 buckets should hit every node at least
once.
> >>>>>> Therefore I think its a configuration problem. So far I messed
> around
> >>>> with
> >>>>>> these settings:
> >>>>>>
> >>>>>> <name>bsp.max.tasks.per.job</name>
> >>>>>> <name>bsp.local.tasks.maximum</name>
> >>>>>> <name>bsp.tasks.maximum</name>
> >>>>>> <name>bsp.child.java.opts</name>
> >>>>>>
> >>>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job
> to
> >> 12
> >>>>>> hat not the desired effect. I also split the input into 12 files
> >>>> (because
> >>>>>> of something in 0.5, that was fixed in 0.6).
> >>>>>>
> >>>>>> Could you recommend me some settings or guide me through the
> system's
> >>>>>> partition decision? I thought it would be:
> >>>>>>
> >>>>>> Input -> Input Split based on input, max* conf values ->
number of
> >> tasks
> >>>>>> HashPartition.class distributes Ids across that number of tasks.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Benedikt
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message