giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dionysios Logothetis <dlogothe...@gmail.com>
Subject Re: Giraph automated parameter tuning
Date Tue, 19 Mar 2019 15:53:47 GMT
Hi Muaz, this is a very interesting topic!

First of all, the top 2 (number of workers, heap size) are indeed the most
important.  Also, I think giraph.numComputeThreads and  probably the
netty-related
thread parameter are more important.

The following are less important:
- giraph.maxMutationsPerRequest: mutations are a feature that probably
kicks in a more limited set of applications, and usually in certain phases
of an application. I would expect this to have limited impact with respect
to the other parameters.
- giraph.useMessageSizeEncoding: this will be applicable in a limited set
of applications that depends on the type of vertex ID/values etc they use.

Also, I would exclude the following:
- giraph.VerticesToUpdateProgress: this is just used to keep stats, it's
not important for processing, i doubt it will have any perf impact.
- giraph.maxPartitionsInMemory: the out-of-core mechanism can be a bit
unreliable, and would make your study harder.
- giraph.checkpointFrequency: checkpointing may not be that common a
feature, and hasn't been properly maintained so you may have trouble using
it.

Aside from these you could consider some GC-related parameters: the type of
the GC (e.g. parallel etc), size of new generation, GC survivor ratio.

I would love to learn more about how you'll be approaching the problem and
ofcourse looking forward to the results.

On Wed, Mar 13, 2019 at 6:12 AM Muaz Twaty <muaz.twaty@euranova.eu> wrote:

> Hello Giraph community,
>
> "*Parameter tuning of graph processing frameworks*" is the domain of
> research for my master thesis. The objective of the thesis is to find an
> automated method to choose an optimal/sub-optimal configuration for the
> graph processing frameworks. At this point, I reviewed the state of the art
> in the optimization literature and reviewed the available graph processing
> frameworks. *Giraph *is the first framework that I started to discover in
> details and start running jobs with it, hoping that it will be the
> framework which I will apply the optimization algorithms on.
>
> My question is regarding the set of parameters which should be chosen to
> optimize. Since I am not a Giraph expert, I thought the best way is to ask
> the community. I made a list of Giraph parameters which I thought are
> important and are related directly to the framework performance. The
> parameters with higher ranks are parameters which I think are more
> important.I hope that you give a feedback about the list: *is it a good
> set of parameters to optimize? Are there some parameters in the set which
> should be fixed for all different kind of jobs? Any suggestion to change
> the ranking, add or remove parameters? *
>
> I will add more parameters regarding the used hardware (number of CPUs,
> size of RAM per CPU and hard disk speed), but the point of this email is to
> focus on the parameters of *Giraph.*
>
> Thanks,
> Muaz TWATY
> *EURA NOVA *
>
>
> Ranking Parameter name Default value Details
> Hadoop 1 -w required Number of workers
> Hadoop 2 -yarnheap 1024 (integer) MB.
> Heap size, in MB, for each Giraph task (YARN only.)
> Giraph 3 giraph.useInputSplitLocality TRUE
> To minimize network usage when reading input splits, each worker can
> prioritize splits that reside on its host. This, however, comes at the cost
> of increased load on ZooKeeper. Hence, users with a lot of splits and input
> threads (or with configurations that can't exploit locality) may want to
> disable it.
> Giraph 4 giraph.useMessageSizeEncoding FALSE
> Use message size encoding (typically better for complex objects, not meant
> for primitive wrapped messages)
> Giraph 5 giraph.VerticesToUpdateProgress 100000
> Minimum number of vertices to compute before updating worker progress
> Giraph 6 giraph.maxMutationsPerRequest 100
> Maximum number of mutations per partition before flush
> Giraph 7 giraph.maxPartitionsInMemory 0
> Maximum number of partitions to hold in memory for each worker. By default
> it is set to 0 (for adaptive out-of-core mechanism
> Giraph 8 giraph.clientReceiveBufferSize 32768 Client receive buffer size
> Giraph 9 giraph.clientSendBufferSize 524288 Client send buffer size
> Giraph 10 giraph.serverReceiveBufferSize 524288 Server receive buffer size
> Giraph 11 giraph.serverSendBufferSize 32768 Server send buffer size
> Giraph 12 giraph.async.message.store.threads 0
> Number of threads to be used in async message store
> Giraph 13 giraph.channelsPerServer 1
> Number of channels used per server
> Giraph 14 giraph.nettyClientExecutionThreads 8
> Netty client execution threads (execution handler)
> Giraph 15 giraph.nettyClientThreads 4 Netty client threads
> Giraph 16 giraph.nettyServerExecutionThreads 8
> Netty server execution threads (execution handler)
> Giraph 17 giraph.nettyServerThreads 16 Netty server threads
> Giraph 18 giraph.numComputeThreads 1
> Number of threads for vertex computation
> Giraph 19 giraph.checkpointFrequency 0
> How often to checkpoint (i.e. 0, means no checkpoint, 1 means every
> superstep, 2 is every two supersteps, etc.).
>
>
>
> ♻ Be green, keep it on the screen

Mime
View raw message