Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB9D018A09 for ; Mon, 14 Dec 2015 22:11:29 +0000 (UTC) Received: (qmail 13544 invoked by uid 500); 14 Dec 2015 22:11:29 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 13483 invoked by uid 500); 14 Dec 2015 22:11:29 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 13472 invoked by uid 99); 14 Dec 2015 22:11:29 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Dec 2015 22:11:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DF9EFC0D14 for ; Mon, 14 Dec 2015 22:11:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xZ_xCfUouVxQ for ; Mon, 14 Dec 2015 22:11:26 +0000 (UTC) Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com [209.85.218.46]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id DB57D42A73 for ; Mon, 14 Dec 2015 22:11:25 +0000 (UTC) Received: by oigy66 with SMTP id y66so27539145oig.0 for ; Mon, 14 Dec 2015 14:11:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=hhllSGxlWiyBiMZasDA++PmLP4pu5AAHUSUbjDy0jv4=; b=PIrC0N+2/d3ulVGx2RVBgK1VBEwFz3H3iewr9/jcj4fYmHFW85drft1zCqE5sIctAA fB8aa5tVfUyoJCr+h7CYLv7AvHTiJ5PzPuh6fduk6QBxV3r7CVPYgc0D+/gjHmuFxFg2 2C9IycHQLaMQKECtwwVV8CPHd5C0Rbaw/9QKN5TyjvOd2iDg1yQU9pBYcQ7r8oACk6IR ppl85AwuFC3Iw4E4TVehNslQnGnELxs9ZKfILV1g3KSW3dZas/9H4kLEjnXojsnIkgkL 6HecO+3LLeevQE7K60MVX4YVBvXgmcCcLK5xSTw1JK6ch5cxaD8A9sDlvceBOHLLX51P TmgA== X-Received: by 10.202.196.67 with SMTP id u64mr4707195oif.94.1450131085454; Mon, 14 Dec 2015 14:11:25 -0800 (PST) MIME-Version: 1.0 Received: by 10.202.188.10 with HTTP; Mon, 14 Dec 2015 14:10:46 -0800 (PST) In-Reply-To: References: <566EA2F9.7000709@gmail.com> <566EA4A2.405@gmail.com> From: Hassan Eslami Date: Mon, 14 Dec 2015 16:10:46 -0600 Message-ID: Subject: Re: Scalability issue on Giraph To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=001a1135253a39f2980526e2f441 --001a1135253a39f2980526e2f441 Content-Type: text/plain; charset=UTF-8 Depending on the paper, some consider and some don't. Although, a fair measurement would consider the IO time as well. I just asked you not to consider the IO time, so to diagnose the problem easier, and isolate the time and only consider actual core computation and memory accesses. As a side note, I assumer the "-w $2" in your command translates to "-w 1", as you are running on a single machine. And, I assume you change the number of compute threads by giraph.numComputeThreads. Also, I assume that you are using the trunk version where GIRAPH-1035 is already applied. You can also play with the number of partitions (using giraph.userPartitionCount) and set it to a multiple of the number of physical cores. For instance if your processor has 16 cores, try 16*4=128 partitions to take advantage of core oversubscription. If you already set all these options in your execution, I personally can't think of any reason to limited scalability other than small input size (which is quite likely assuming you have 128 partitions, each partition will have only around 4K vertices), or memory bandwidth. Hassan On Mon, Dec 14, 2015 at 3:12 PM, Karos Lotfifar wrote: > Thanks Hasan, > > My question is: is it the way others report superstep times in research > papers such as Hama, Giraph, GraphLab, etc. Are they not considering the > I/O time? (As they report good scalable reports for Pagerank calculation). > > The other case is that, even taking into account the computational > supersteps excluding I/O overhead does not make any improve. Taking into > account the first superstep (superstep 0 is not considered as you say) the > timing details for amazon would be: > > # Processor cores > 2 4 8 16 > 24 > 32superstep 1 5101 3642 3163 2867 3611 > 3435 (ms) > > > As you see, this is the same story! while no I/O is considered. > > > > Regards, > Karos > > > On Mon, Dec 14, 2015 at 9:29 PM, Hassan Eslami > wrote: > >> Hi, >> >> Here is my take on it: >> It may be a good idea to isolate the time spent in compute supersteps. In >> order to do that, you can look at per-superstep timing metrics and >> aggregate the times for all supersteps except input (-1) and output (last) >> superstep. This eliminates the time for IO and turns your focus only on >> time spent accessing memory and doing computation in cores. In general, if >> an application is memory-bound (for instance PageRank), increasing the >> number of threads does not necessarily decrease the time after a certain >> point, due to the fact that memory bandwidth can become a bottleneck. In >> other words, once the memory bandwidth has been saturated by a certain >> number of threads, increasing the number of threads will not decrease the >> execution time anymore. >> >> Best, >> Hassan >> >> On Mon, Dec 14, 2015 at 5:14 AM, Foad Lotfifar wrote: >> >>> Hi, >>> >>> I have a scalability issue for Giraph and I can not find out where is >>> the problem. >>> >>> --- Cluster specs: >>> # nodes 1 >>> # threads 32 >>> Processor Intel Xeon 2.0GHz >>> OS ubuntu 32bit >>> RAM 64GB >>> >>> --- Giraph specs >>> Hadoop Apache Hadoop 1.2.1 >>> Giraph 1.2.0 Snapshot >>> >>> Tested Graphs: >>> amazon0302 V=262,111, E=1,234,877 >>> coAuthorsCiteseer V=227,320, E=1,628,268 >>> >>> >>> I run the provided PageRank algorithm in Giraph >>> "SimplePageRankComputation" with the followng options >>> >>> (time ($HADOOP_HOME/bin/hadoop jar >>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar >>> \ >>> org.apache.giraph.GiraphRunner >>> -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory >>> \ >>> org.apache.giraph.examples.PageRankComputation \ >>> -vif >>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ >>> -vip /user/hduser/input/$file \ >>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ >>> -op /user/hduser/output/pagerank -w $2 \ >>> -mc >>> org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute)) >>> 2>&1 \ >>> | tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt >>> >>> The algorithm runs without any issue. The number of supersteps is set to >>> 31 by default in the algorithm. >>> >>> *Problem:* >>> I dont get any scalability for more than 8 (or 16) processor cores that >>> is I get speedup up to 8 (or 16) cores and then the run time starts to >>> increase. >>> >>> I have run the PageRank with only one superstep as well as running other >>> algorithms such as ShortestPath algorithm. I get the same results. I can >>> not figure out where is the problem. >>> >>> 1- I have tried two options by changing the giraph.numInputThreads and >>> giraph.numOutputThreads: the performance gets a littile bit better but no >>> impact on scalability. >>> 2- Does it related to the size of the graphs? because the graphs I am >>> testing are small graphs. >>> 3- Is it a platform related issue? >>> >>> It is the timing details of amazon graph: >>> >>> # Processor cores >>> 1 2 4 8 16 24 32 >>> Input >>> 3260 3447 3269 3921 4555 4766 Intialise >>> 3467 36458 45474 39091 100281 79012 Setup >>> 34 52 59 70 77 86 Shutdown >>> 9954 10226 11021 9524 13393 15930 Total >>> 135482 84483 61081 52190 58921 61898 >>> HDFS READ >>> 21097485 26117723 36158199 57808783 80086015 102163071 FILE WRITE >>> 65889 109815 197667 373429 549165 724901 HDFS WRITE >>> 7330986 7331068 7331093 7330988 7330976 7331203 >>> >>> Best Regards, >>> Karos >>> >>> >>> >>> >> > > > -- > Regards, > Karos Lotfifar > --001a1135253a39f2980526e2f441 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Depending on the paper, some consider and some don't. = Although, a fair measurement would consider the IO time as well. I just ask= ed you not to consider the IO time, so to diagnose the problem easier, and = isolate the time and only consider actual core computation and memory acces= ses.

As a side note, I assumer the "-w $2" in = your command translates to "-w 1", as you are running on a single= machine. And, I assume you change the number of compute threads by giraph.= numComputeThreads. Also, I assume that you are using the trunk version wher= e GIRAPH-1035 is already applied. You can also play with the number of part= itions (using giraph.userPartitionCount) and set it to a multiple of the nu= mber of physical cores. For instance if your processor has 16 cores, try 16= *4=3D128 partitions to take advantage of core oversubscription. If you alre= ady set all these options in your execution, I personally can't think o= f any reason to limited scalability other than small input size (which is q= uite likely assuming you have 128 partitions, each partition will have only= around 4K vertices), or memory bandwidth.

Hassan<= br>


On Mon, Dec 14, 2015 at 3:12 PM, Karos Lotfifar <foadfbf@gmai= l.com> wrote:
Thanks Hasan,

My question is: is it the way othe= rs report superstep times in research papers such as Hama, Giraph, GraphLab= , etc. Are they not considering the I/O time? (As they report good scalable= reports for Pagerank calculation).

The other case is tha= t, even taking into account the computational supersteps excluding I/O over= head does not make any improve. Taking into account the first superstep (su= perstep 0 is not considered as you say) the timing details for amazon would= be:

# Processor cores
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A02=C2=A0=C2=A0=C2=A0 = =C2=A0 =C2=A0=C2=A0 =C2=A0 4=C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 8=C2=A0= =C2=A0=C2=A0 =C2=A0 =C2=A0=C2=A0 16=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0
24=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0
32superstep= 1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 5101=C2=A0=C2=A0=C2=A0=C2=A0 3642=C2=A0=C2=A0=C2=A0=C2= =A0 3163=C2=A0=C2=A0=C2=A0 2867=C2=A0 =C2=A0=C2=A0 3611 =C2=A0 3435 (ms)


As you see, this is the same story! while no= I/O is considered.



<= div>Regards,
Karos


On M= on, Dec 14, 2015 at 9:29 PM, Hassan Eslami <hsn.eslami@gmail.com>= ; wrote:
Hi,
=
Here is my take on it:
It may be a good idea to is= olate the time spent in compute supersteps. In order to do that, you can lo= ok at per-superstep timing metrics and aggregate the times for all superste= ps except input (-1) and output (last) superstep. This eliminates the time = for IO and turns your focus only on time spent accessing memory and doing c= omputation in cores. In general, if an application is memory-bound (for ins= tance PageRank), increasing the number of threads does not necessarily decr= ease the time after a certain point, due to the fact that memory bandwidth = can become a bottleneck. In other words, once the memory bandwidth has been= saturated by a certain number of threads, increasing the number of threads= will not decrease the execution time anymore.

Bes= t,
Hassan

On Mon, Dec 14, 2015 at 5:14 AM, Foad Lotfifar <fo= adfbf@gmail.com> wrote:
=20 =20 =20
Hi,

I have a scalability issue for Giraph and I can not find out where is the problem.

--- Cluster specs:
# nodes =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 1 # threads=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0 32
Processor=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0 Intel Xeon 2.0G= Hz
OS=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 ubuntu 32bit=C2=A0
RAM=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0=C2=A0 64GB

--- Giraph specs
Hadoop=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Apache= Hadoop 1.2.1
Giraph=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0= 1.2.0 Snapshot

Tested Graphs:
amazon0302=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 = =C2=A0=C2=A0=C2=A0 V=3D262,111, E=3D1,234,877 =20 =20 =20
coAuthorsCiteseer=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 V=3D227,320, E= =3D1,628,268


I run the provided PageRank algorithm in Giraph "SimplePageRankComputation" with the followng options

(time ($HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-fo= r-hadoop-1.2.1-jar-with-dependencies.jar \
=C2=A0org.apache.giraph.GiraphRunner -Dgiraph.graphPartitionerFactoryClass=3Dorg.apache.giraph.partition.H= ashRangePartitionerFactory \
=C2=A0org.apache.giraph.examples.PageRankComputation=C2=A0 \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputForm= at \
-vip /user/hduser/input/$file \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/hduser/output/pagerank -w $2 \
-mc org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute= )) 2>&1 \
| tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt

The algorithm runs without any issue. The number of supersteps is set to 31 by default in the algorithm.

Problem:
I dont get any scalability for more than 8 (or 16) processor cores that is I get speedup up to 8 (or 16) cores and then the run time starts to increase.

I have run the PageRank with only one superstep as well as running other algorithms such as ShortestPath algorithm. I get the same results. I can not figure out where is the problem.

1- I have tried two options by changing the giraph.numInputThreads and giraph.numOutputThreads: the performance gets a littile bit better but no impact on scalability.
2- Does it related to the size of the graphs? because the graphs I am testing are small graphs.
3- Is it a platform related issue?

It is the timing details of amazon graph:

=20
# Processor cores
1 2 4 8 16 24 32
=20 =20
=20
Input
3260 3447 3269 3921 4555 4766
Intialise
3467 36458 45474 39091 100281 79012
Setup
34 52 59 70 77 86
Shutdown
9954 10226 11021 9524 13393 15930
Total
135482 84483 61081 52190 58921 61898
=20 =20
=20
HDFS READ
21097485 26117723 36158199 57808783 80086015 102163071
FILE WRITE
65889 109815 197667 373429 549165 724901
HDFS WRITE
7330986 7331068 7331093 7330988 7330976 7331203
=20 =20

Best Regards,
Karos







--
Regards,
Karos Lotf= ifar

--001a1135253a39f2980526e2f441--