giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arjun Sharma <>
Subject Re: Optimal configuration for Giraph on YARN
Date Wed, 29 Apr 2015 16:21:12 GMT
Hi Steven,

Thank you so much for your detailed reply! Actually, my second question was
about if we do not set (defaults to 1) or
yarn.nodemanager.resource.cpu-vcores (defaults to 8), while we set
giraph.numComputeThreads (say to 16). I expect every worker will run 16
threads on 1 core, but wanted to see if you have the same understanding.


On Wed, Apr 29, 2015 at 8:50 AM, Steven Harenberg <> wrote:

> Hey Arjun,
> I am glad someone finally responded to this thread. I am surprised no one
> else is trying to figure out these configuration settings...
> Here is my understanding of your questions (though I am not sure they are
> right):
> *Is setting both and
> yarn.nodemanager.resource.cpu-vcores is required?*
> Yes, I believe you need both of these set or else they will revert to
> default values. Importantly, I think you should set these to the same value
> so that you spawn one mapper/giraph-worker per machine (as this was said to
> be optimal).
> Since I have 32 cores per machine, I have set both these values to 32 and
> has worked to only spawn one worker per machine (unless I try to have a
> worker share a machine with the master).
> Check this page out:
> *What happens if they are not set, while giraph.numComputeThreads is set?*
> The above parameters specify how many nodes per machine you are allowing
> for workers AND how many cores one worker will use. If you don't set *giraph.numComputeThreads
> *then the worker will use the default number (I think that is 1) despite
> possibly being allocated more cores. Hence, I set *giraph.numComputeThreads,
> **giraph.numInputThreads, *and *giraph.numOutputThreads *to be the same
> as the above two paramters, the total cores in one machine (for me 32).
> Giraph is never going to fully utilize the entire machine, so I don't
> think its really possible to tell if these are correct settings, but all of
> this seems reasonable based on my experience and how these parameters are
> defined.
> *Are there any other parameters that must be set in order to make sure we
> are *really* using the cores, not just multi-threading on a single core?*
> No idea, but the above parameters and some memory configurations are all I
> set. The memory configurations are worse in my opinion, as I was running
> into memory issues and ended up having to manually set the following
> parameters:
>    - yarn.nodemanager.resource.memory-mb
>    - yarn.scheduler.minimum-allocation-mb
>    - yarn.scheduler.maximum-allocation-mb
>    -
>    - -yh (in Giraph arguments)
> All of these were required to be manually set to get Giraph to run without
> having memory issues.
> Best regards,
> Steve
> On Thu, Apr 23, 2015 at 8:15 PM, Arjun Sharma <> wrote:
>> Just bumping up this thread, as I am having the same question as Steven's.
>> Steven, did you get to know if setting both and
>> yarn.nodemanager.resource.cpu-vcores is required? What happens if they
>> are not set, while giraph.numComputeThreads is set? Are there any
>> other parameters that must be set in order to make sure we are *really*
>> using the cores, not just multi-threading on a single core?
>> On Wed, Mar 18, 2015 at 11:48 AM, Steven Harenberg <>
>> wrote:
>>> Hi all,
>>> Previously with MapReduceV1, the suggestion was to have a 1:1
>>> correspondence between workers and compute nodes (machines) and set the
>>> number of the threads to be the number of cores per machines. To achieve
>>> this configuration, we would set "".
>>> Since workers correspond to mappers this would ensure there was one worker
>>> per machine.
>>> Now I am reading that with Yarn this property longer exists as there
>>> aren't tasktrackers. Instead, we have the global properties
>>> "yarn.nodemanager.resource.cpu-vcores", which specifies the cores _per
>>> node_, and the property "", which specifies the
>>> cores _per map task_.
>>> If we want to have one mapper per node that is fully utilizing the
>>> machine, I assume we should just set =
>>> yarn.nodemanager.resource.cpu-vcores = the # of cores per node. Is this
>>> correct?
>>> Do I still need to set giraph.numComputeThreads to be the number of
>>> cores per node?
>>> Thanks,
>>> Steve

View raw message