giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Harenberg <sdhar...@ncsu.edu>
Subject Re: Optimal configuration for Giraph on YARN
Date Wed, 29 Apr 2015 15:50:01 GMT
Hey Arjun,

I am glad someone finally responded to this thread. I am surprised no one
else is trying to figure out these configuration settings...

Here is my understanding of your questions (though I am not sure they are
right):


*Is setting both mapreduce.map.cpu.vcores and
yarn.nodemanager.resource.cpu-vcores is required?*

Yes, I believe you need both of these set or else they will revert to
default values. Importantly, I think you should set these to the same value
so that you spawn one mapper/giraph-worker per machine (as this was said to
be optimal).

Since I have 32 cores per machine, I have set both these values to 32 and
has worked to only spawn one worker per machine (unless I try to have a
worker share a machine with the master).

Check this page out:
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/


*What happens if they are not set, while giraph.numComputeThreads is set?*

The above parameters specify how many nodes per machine you are allowing
for workers AND how many cores one worker will use. If you don't set
*giraph.numComputeThreads
*then the worker will use the default number (I think that is 1) despite
possibly being allocated more cores. Hence, I set *giraph.numComputeThreads,
**giraph.numInputThreads, *and *giraph.numOutputThreads *to be the same as
the above two paramters, the total cores in one machine (for me 32).

Giraph is never going to fully utilize the entire machine, so I don't think
its really possible to tell if these are correct settings, but all of this
seems reasonable based on my experience and how these parameters are
defined.



*Are there any other parameters that must be set in order to make sure we
are *really* using the cores, not just multi-threading on a single core?*

No idea, but the above parameters and some memory configurations are all I
set. The memory configurations are worse in my opinion, as I was running
into memory issues and ended up having to manually set the following
parameters:

   - yarn.nodemanager.resource.memory-mb
   - yarn.scheduler.minimum-allocation-mb
   - yarn.scheduler.maximum-allocation-mb
   - mapreduce.map.memory.mb
   - -yh (in Giraph arguments)

All of these were required to be manually set to get Giraph to run without
having memory issues.

Best regards,
Steve

On Thu, Apr 23, 2015 at 8:15 PM, Arjun Sharma <as469613@gmail.com> wrote:

> Just bumping up this thread, as I am having the same question as Steven's.
>
> Steven, did you get to know if setting both mapreduce.map.cpu.vcores and
> yarn.nodemanager.resource.cpu-vcores is required? What happens if they
> are not set, while giraph.numComputeThreads is set? Are there any
> other parameters that must be set in order to make sure we are *really*
> using the cores, not just multi-threading on a single core?
>
>
> On Wed, Mar 18, 2015 at 11:48 AM, Steven Harenberg <sdharenb@ncsu.edu>
> wrote:
>
>> Hi all,
>>
>> Previously with MapReduceV1, the suggestion was to have a 1:1
>> correspondence between workers and compute nodes (machines) and set the
>> number of the threads to be the number of cores per machines. To achieve
>> this configuration, we would set "mapred.tasktracker.map.tasks.maximum=1".
>> Since workers correspond to mappers this would ensure there was one worker
>> per machine.
>>
>> Now I am reading that with Yarn this property longer exists as there
>> aren't tasktrackers. Instead, we have the global properties
>> "yarn.nodemanager.resource.cpu-vcores", which specifies the cores _per
>> node_, and the property "mapreduce.map.cpu.vcores", which specifies the
>> cores _per map task_.
>>
>> If we want to have one mapper per node that is fully utilizing the
>> machine, I assume we should just set mapreduce.map.cpu.vcores =
>> yarn.nodemanager.resource.cpu-vcores = the # of cores per node. Is this
>> correct?
>>
>> Do I still need to set giraph.numComputeThreads to be the number of cores
>> per node?
>>
>> Thanks,
>> Steve
>>
>
>

Mime
View raw message