Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Message-ID: <55D5A75A.8090402@uni-ulm.de>
Date: Thu, 20 Aug 2015 12:09:30 +0200
From: Sonja Koenig <sonja.koenig@uni-ulm.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: user@giraph.apache.org
Subject: Giraph Performance Tuning
Content-Type: multipart/alternative;
 boundary="------------000008030403050503090600"

This is a multi-part message in MIME format.
--------------000008030403050503090600
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hey there everyone!

I am currently writing my bachelor thesis about Giraph and GraphX, where 
I am trying to compare their scalability and features and bring them 
into a context with different graph types.
In order to compare the two on a fair basis, I want to tune the 
frameworks to get the most out of them :-)
I was hoping to get some tips and tricks from you all, where I can make 
some configurations to impact my computations..

My set up:
10 machines, each 1 cpu with 1 3,3GHz core, 4GB RAM, 100GB HDD -> one is 
designated master
Giraph 1.10
Hadoop 1.2.1

So far I haven't done any special configurations for hadoop or giraph 
besides the basic ones during setup.
Performance-critical might be these:
In *mapred-site.xml*:
     mapred.tasktracker.map.tasks.maximum = 4
     mapred.map.tasks=4
In *dfs-site.xml*:
     dfs.replication=3

If I am correctly informed, the default amount of heap is 1000MB, which 
I haven't changed. I am also not sure where I can actually increase 
memory usage. Any advice?
Also, I read somewhere that it is smarter to increase the amount of 
threads per worker and not the amount of worker per machine? But I am 
anyways somewhat handicapped with only one core per machine..

Lastly, has anyone notice any performance changes when using 
checkointing, cominers, aggregators and so on?
Is the use of combiners and aggregators a choice of the application code 
or my execution command?

I would appreciate any advice and comments greatly! :-)

Greetings from Ulm,
Sonja


--------------000008030403050503090600
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hey there everyone!<br>
    <br>
    I am currently writing my bachelor thesis about Giraph and GraphX,
    where I am trying to compare their scalability and features and
    bring them into a context with different graph types.<br>
    In order to compare the two on a fair basis, I want to tune the
    frameworks to get the most out of them :-)<br>
    I was hoping to get some tips and tricks from you all, where I can
    make some configurations to impact my computations..<br>
    <br>
    My set up:<br>
    10 machines, each 1 cpu with 1 3,3GHz core, 4GB RAM, 100GB HDD -&gt;
    one is designated master<br>
    Giraph 1.10<br>
    Hadoop 1.2.1<br>
    <br>
    So far I haven't done any special configurations for hadoop or
    giraph besides the basic ones during setup.<br>
    Performance-critical might be these:<br>
    In <b>mapred-site.xml</b>:<br>
        mapred.tasktracker.map.tasks.maximum = 4<br>
        mapred.map.tasks=4<br>
    In <b>dfs-site.xml</b>:<br>
        dfs.replication=3<br>
    <br>
    If I am correctly informed, the default amount of heap is 1000MB,
    which I haven't changed. I am also not sure where I can actually
    increase memory usage. Any advice?<br>
    Also, I read somewhere that it is smarter to increase the amount of
    threads per worker and not the amount of worker per machine? But I
    am anyways somewhat handicapped with only one core per machine..<br>
    <br>
    Lastly, has anyone notice any performance changes when using
    checkointing, cominers, aggregators and so on?<br>
    Is the use of combiners and aggregators a choice of the application
    code or my execution command?<br>
    <br>
    I would appreciate any advice and comments greatly! :-)<br>
    <br>
    Greetings from Ulm,<br>
    Sonja <br>
    <br>
  </body>
</html>

--------------000008030403050503090600--