Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8868A17F3E for ; Thu, 20 Aug 2015 10:09:38 +0000 (UTC) Received: (qmail 53491 invoked by uid 500); 20 Aug 2015 10:09:32 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 53441 invoked by uid 500); 20 Aug 2015 10:09:32 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 53427 invoked by uid 99); 20 Aug 2015 10:09:32 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2015 10:09:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D645319B31B for ; Thu, 20 Aug 2015 10:09:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.994 X-Spam-Level: ** X-Spam-Status: No, score=2.994 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RP_MATCHES_RCVD=-0.006] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id oadt_WVH34EX for ; Thu, 20 Aug 2015 10:09:27 +0000 (UTC) Received: from smtp.uni-ulm.de (smtp.uni-ulm.de [134.60.1.26]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 04BDE25073 for ; Thu, 20 Aug 2015 10:09:26 +0000 (UTC) Received: from [134.60.153.169] (wlan153-169.wlan.uni-ulm.de [134.60.153.169]) (authenticated bits=0) by mail.uni-ulm.de (8.14.9/8.14.7) with ESMTP id t7KA9HDb022799 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Thu, 20 Aug 2015 12:09:17 +0200 (CEST) Message-ID: <55D5A75A.8090402@uni-ulm.de> Date: Thu, 20 Aug 2015 12:09:30 +0200 From: Sonja Koenig User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: user@giraph.apache.org Subject: Giraph Performance Tuning Content-Type: multipart/alternative; boundary="------------000008030403050503090600" X-DCC-debian-Metrics: poseidon 1169; Body=1 Fuz1=1 Fuz2=1 X-Virus-Scanned: by amavisd-new This is a multi-part message in MIME format. --------------000008030403050503090600 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hey there everyone! I am currently writing my bachelor thesis about Giraph and GraphX, where I am trying to compare their scalability and features and bring them into a context with different graph types. In order to compare the two on a fair basis, I want to tune the frameworks to get the most out of them :-) I was hoping to get some tips and tricks from you all, where I can make some configurations to impact my computations.. My set up: 10 machines, each 1 cpu with 1 3,3GHz core, 4GB RAM, 100GB HDD -> one is designated master Giraph 1.10 Hadoop 1.2.1 So far I haven't done any special configurations for hadoop or giraph besides the basic ones during setup. Performance-critical might be these: In *mapred-site.xml*: mapred.tasktracker.map.tasks.maximum = 4 mapred.map.tasks=4 In *dfs-site.xml*: dfs.replication=3 If I am correctly informed, the default amount of heap is 1000MB, which I haven't changed. I am also not sure where I can actually increase memory usage. Any advice? Also, I read somewhere that it is smarter to increase the amount of threads per worker and not the amount of worker per machine? But I am anyways somewhat handicapped with only one core per machine.. Lastly, has anyone notice any performance changes when using checkointing, cominers, aggregators and so on? Is the use of combiners and aggregators a choice of the application code or my execution command? I would appreciate any advice and comments greatly! :-) Greetings from Ulm, Sonja --------------000008030403050503090600 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit Hey there everyone!

I am currently writing my bachelor thesis about Giraph and GraphX, where I am trying to compare their scalability and features and bring them into a context with different graph types.
In order to compare the two on a fair basis, I want to tune the frameworks to get the most out of them :-)
I was hoping to get some tips and tricks from you all, where I can make some configurations to impact my computations..

My set up:
10 machines, each 1 cpu with 1 3,3GHz core, 4GB RAM, 100GB HDD -> one is designated master
Giraph 1.10
Hadoop 1.2.1

So far I haven't done any special configurations for hadoop or giraph besides the basic ones during setup.
Performance-critical might be these:
In mapred-site.xml:
    mapred.tasktracker.map.tasks.maximum = 4
    mapred.map.tasks=4
In dfs-site.xml:
    dfs.replication=3

If I am correctly informed, the default amount of heap is 1000MB, which I haven't changed. I am also not sure where I can actually increase memory usage. Any advice?
Also, I read somewhere that it is smarter to increase the amount of threads per worker and not the amount of worker per machine? But I am anyways somewhat handicapped with only one core per machine..

Lastly, has anyone notice any performance changes when using checkointing, cominers, aggregators and so on?
Is the use of combiners and aggregators a choice of the application code or my execution command?

I would appreciate any advice and comments greatly! :-)

Greetings from Ulm,
Sonja

--------------000008030403050503090600--