giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Muaz Twaty <muaz.tw...@euranova.eu>
Subject Re: Giraph automated parameter tuning
Date Thu, 21 Mar 2019 08:54:07 GMT
Hi Dionysios, thank you for your reply :D.

We are planning on trying multiple optimization algorithms and maybe
implement a search algorithm (like hill-climbing algorithm guided by a
prediction model). We are targeting very expensive (time consuming) graph
jobs.

Regarding the results, I will keep you posted for sure. Hopefully, we will
submit a paper if the work was a success.

Thanks again for your notes they are very helpful.
Best,

Muaz TWATY
*EURA NOVA*


On Tue, 19 Mar 2019 at 16:54, Dionysios Logothetis <dlogothetis@gmail.com>
wrote:

> Hi Muaz, this is a very interesting topic!
>
> First of all, the top 2 (number of workers, heap size) are indeed the most
> important.  Also, I think giraph.numComputeThreads and  probably the netty-related
> thread parameter are more important.
>
> The following are less important:
> - giraph.maxMutationsPerRequest: mutations are a feature that probably
> kicks in a more limited set of applications, and usually in certain phases
> of an application. I would expect this to have limited impact with respect
> to the other parameters.
> - giraph.useMessageSizeEncoding: this will be applicable in a limited set
> of applications that depends on the type of vertex ID/values etc they use.
>
> Also, I would exclude the following:
> - giraph.VerticesToUpdateProgress: this is just used to keep stats, it's
> not important for processing, i doubt it will have any perf impact.
> - giraph.maxPartitionsInMemory: the out-of-core mechanism can be a bit
> unreliable, and would make your study harder.
> - giraph.checkpointFrequency: checkpointing may not be that common a
> feature, and hasn't been properly maintained so you may have trouble using
> it.
>
> Aside from these you could consider some GC-related parameters: the type
> of the GC (e.g. parallel etc), size of new generation, GC survivor ratio.
>
> I would love to learn more about how you'll be approaching the problem and
> ofcourse looking forward to the results.
>
> On Wed, Mar 13, 2019 at 6:12 AM Muaz Twaty <muaz.twaty@euranova.eu> wrote:
>
>> Hello Giraph community,
>>
>> "*Parameter tuning of graph processing frameworks*" is the domain of
>> research for my master thesis. The objective of the thesis is to find an
>> automated method to choose an optimal/sub-optimal configuration for the
>> graph processing frameworks. At this point, I reviewed the state of the art
>> in the optimization literature and reviewed the available graph processing
>> frameworks. *Giraph *is the first framework that I started to discover
>> in details and start running jobs with it, hoping that it will be the
>> framework which I will apply the optimization algorithms on.
>>
>> My question is regarding the set of parameters which should be chosen to
>> optimize. Since I am not a Giraph expert, I thought the best way is to ask
>> the community. I made a list of Giraph parameters which I thought are
>> important and are related directly to the framework performance. The
>> parameters with higher ranks are parameters which I think are more
>> important.I hope that you give a feedback about the list: *is it a good
>> set of parameters to optimize? Are there some parameters in the set which
>> should be fixed for all different kind of jobs? Any suggestion to change
>> the ranking, add or remove parameters? *
>>
>> I will add more parameters regarding the used hardware (number of CPUs,
>> size of RAM per CPU and hard disk speed), but the point of this email is to
>> focus on the parameters of *Giraph.*
>>
>> Thanks,
>> Muaz TWATY
>> *EURA NOVA *
>>
>>
>> Ranking Parameter name Default value Details
>> Hadoop 1 -w required Number of workers
>> Hadoop 2 -yarnheap 1024 (integer) MB.
>> Heap size, in MB, for each Giraph task (YARN only.)
>> Giraph 3 giraph.useInputSplitLocality TRUE
>> To minimize network usage when reading input splits, each worker can
>> prioritize splits that reside on its host. This, however, comes at the cost
>> of increased load on ZooKeeper. Hence, users with a lot of splits and input
>> threads (or with configurations that can't exploit locality) may want to
>> disable it.
>> Giraph 4 giraph.useMessageSizeEncoding FALSE
>> Use message size encoding (typically better for complex objects, not
>> meant for primitive wrapped messages)
>> Giraph 5 giraph.VerticesToUpdateProgress 100000
>> Minimum number of vertices to compute before updating worker progress
>> Giraph 6 giraph.maxMutationsPerRequest 100
>> Maximum number of mutations per partition before flush
>> Giraph 7 giraph.maxPartitionsInMemory 0
>> Maximum number of partitions to hold in memory for each worker. By
>> default it is set to 0 (for adaptive out-of-core mechanism
>> Giraph 8 giraph.clientReceiveBufferSize 32768 Client receive buffer size
>> Giraph 9 giraph.clientSendBufferSize 524288 Client send buffer size
>> Giraph 10 giraph.serverReceiveBufferSize 524288 Server receive buffer
>> size
>> Giraph 11 giraph.serverSendBufferSize 32768 Server send buffer size
>> Giraph 12 giraph.async.message.store.threads 0
>> Number of threads to be used in async message store
>> Giraph 13 giraph.channelsPerServer 1
>> Number of channels used per server
>> Giraph 14 giraph.nettyClientExecutionThreads 8
>> Netty client execution threads (execution handler)
>> Giraph 15 giraph.nettyClientThreads 4 Netty client threads
>> Giraph 16 giraph.nettyServerExecutionThreads 8
>> Netty server execution threads (execution handler)
>> Giraph 17 giraph.nettyServerThreads 16 Netty server threads
>> Giraph 18 giraph.numComputeThreads 1
>> Number of threads for vertex computation
>> Giraph 19 giraph.checkpointFrequency 0
>> How often to checkpoint (i.e. 0, means no checkpoint, 1 means every
>> superstep, 2 is every two supersteps, etc.).
>>
>>
>>
>> ♻ Be green, keep it on the screen
>
>

-- 
♻ Be green, keep it on the screen

Mime
View raw message