giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Krause>
Subject Re: Optimal configuration for benchmark
Date Thu, 27 Jun 2013 18:12:49 GMT
Dear David,

this is a good starting point. Thanks.

I cannot share the benchmark yet because it is part of some ongoing 
research. Once we have decided what to do with it, I will be able to 
publish it.


On 06/27/2013 07:11 PM, David Boyd wrote:
> Christian:
>       I have actually been looking for a more general Giraph benchmark 
> and would love to test/play with
> what you have.
>       To answer your questions we need to first assume a dedicated 
> cluster where your test is the only
> one running.
>        For number of mappers we will assume that your cluster is 
> configured in a pseudo-standard one mapper
> per core (e.g. the max mappers for each node equals the number of 
> cores on that node).     For giraph, due to it
> being CPU centric it is pretty important that you not oversubscribe 
> the cores.
>          So for the number of mappers you should use <total cluster 
> mappers> - 1.   This is because Giraph needs
> one mapper for the master node.
>            HEAP_SIZE and are basically 
> equivalent (but I prefer the latter).   In any case,
> a part of this question depends on the what besides Hadoop is running 
> on each node.   Generally, you want each
> mapper to have as much heap space as possible.  The goal is to avoid 
> swapping, leave enough memory free for buffer
> cache, and have enough heap for each task that it does not need to 
> spend a ton of time in garbage collection.
> I like to look at an idle node and see what the base overhead of used 
> memory is.  Then depending on the IO
> requirements of my job (especially read IO) reserve a portion of the 
> remaining memory for buffer cache and then
> divide the remainder by the number of mappers.
>               That is sort of the top down approach.  A bottom up 
> approach would look at the size of objects being
> managed/used in a mapper and compute upwards from there.
>                That said a -Xmx 4G would be the low end of what I 
> would specify.   Also, you may want to set the options
> whcih change how Java does garbage collection.
> Hope this helps.
> On 6/27/2013 12:20 PM, Christian Krause wrote:
>> Hi,
>> I implemented a benchmark that allows me to generate an arbitrarily 
>> large graph (depending on the number of iterations). Now I would like 
>> to configure Giraph so that I can make the best use of my hardware 
>> for this benchmark. Based on the number of nodes in my cluster, their 
>> amount of main memory and number of cores, I am asking myself how do 
>> I determine the optimal parameters of Giraph / Hadoop, specifically:
>> - the number of used mappers
>> - the HEAP_SIZE environment variable
>> - the memory specified in the property
>> (any other relevant parameters?)
>> Also, I was wondering how well Giraph can handle computations which 
>> start with a very small graph and mutate it to a very large one. For 
>> example, if I understand correctly the number of mappers is not 
>> dynamically adjusted.
>> Any hints (or links to documentation) are highly appreciated.
>> Cheers,
>> Christian
> -- 
>  ============
> David W. Boyd
> Director, Engineering
> 7901 Jones Branch, Suite 700
> Mclean, VA 22102
> office:   +1-571-279-2122
> fax:     +1-703-506-6703
> cell:     +1-703-402-7908
> ==============  ============
> First Robotic Mentor - FRC, FTC
> President - USSTEM Foundation
> The information contained in this message may be privileged
> and/or confidential and protected from disclosure.
> If the reader of this message is not the intended recipient
> or an employee or agent responsible for delivering this message
> to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication
> is strictly prohibited.  If you have received this communication
> in error, please notify the sender immediately by replying to
> this message and deleting the material from any computer.

View raw message