Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC3D010A41 for ; Tue, 8 Apr 2014 02:56:36 +0000 (UTC) Received: (qmail 31036 invoked by uid 500); 8 Apr 2014 02:56:35 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 31002 invoked by uid 500); 8 Apr 2014 02:56:34 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 30990 invoked by uid 99); 8 Apr 2014 02:56:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2014 02:56:33 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pavanka@outlook.com designates 65.54.190.223 as permitted sender) Received: from [65.54.190.223] (HELO bay0-omc4-s21.bay0.hotmail.com) (65.54.190.223) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2014 02:56:27 +0000 Received: from BAY176-W11 ([65.54.190.199]) by bay0-omc4-s21.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 7 Apr 2014 19:56:04 -0700 X-TMN: [+SWaOyDfKWB15J4WSjjSNqRYFZFlvsE7] X-Originating-Email: [pavanka@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_11dfd472-fb63-47e0-87cc-b2222b24cf68_" From: Pavan Kumar A To: "user@giraph.apache.org" Subject: RE: [Solved] Giraph job hangs indefinitely and is eventually killed by JobTracker Date: Tue, 8 Apr 2014 08:26:04 +0530 Importance: Normal In-Reply-To: <1998640635.921354.1396913229980.JavaMail.zimbra@stanford.edu> References: <43542824.4019222.1396554548120.JavaMail.zimbra@stanford.edu> <1696746208.4389778.1396572398582.JavaMail.zimbra@stanford.edu> <533E1870.1070801@apache.org> <356765409.4601252.1396616811169.JavaMail.zimbra@stanford.edu> <1904550749.28380.1396863944166.JavaMail.zimbra@stanford.edu> <53428853.6070209@firma.seznam.cz> <824427340.772045.1396907893323.JavaMail.zimbra@stanford.edu>,<1998640635.921354.1396913229980.JavaMail.zimbra@stanford.edu> MIME-Version: 1.0 X-OriginalArrivalTime: 08 Apr 2014 02:56:04.0435 (UTC) FILETIME=[14E95A30:01CF52D6] X-Virus-Checked: Checked by ClamAV on apache.org --_11dfd472-fb63-47e0-87cc-b2222b24cf68_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Vikesh=2C It seems that you are trying to run benchmarks on giraph.We had a lot of im= provements in 1.1.0-SNAPSHOT - (though it is not released publicly in maven= at Facebook we run all our applications on the snapshot version)So=2C you = can pull the latest trunk from giraph: git clone https://git-wip-us.apache.= org/repos/asf/giraph.git And then try running some applications. [you are correct=2C we store hostnames-taskid mappings in the beginning of = the run=2C so u can see such failures] Date: Mon=2C 7 Apr 2014 16:27:09 -0700 From: vikesh@stanford.edu To: user@giraph.apache.org Subject: [Solved] Giraph job hangs indefinitely and is eventually killed by= JobTracker Hi=2C=20 Thanks for the help! Turns out this was happening because /etc/hosts had an= outdated IP address (dynamic) for the host that was being used as the mast= er. Giraph was probably failing to communicate with the master throughout a= nd getting stuck indefinitely. Thanks=2CVikesh Khanna=2C Masters=2C Computer Science (Class of 2015) Stanford University From: "Vikesh Khanna" To: user@giraph.apache.org Sent: Monday=2C April 7=2C 2014 2:58:13 PM Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobT= racker @Pankaj=2C I am running the ShortestPath example on a tiny graph now (5 nod= es). That is also getting hung indefinitely the exact same way. This machin= e has 1 TB of memory and I have used -Xmx25g (25 GB)=20 as Java options. So hopefully it should not be because of memory limitation= . [(free/total/max) =3D 1706.68M / 1979.75M / 25242.25M] @Lukas=2C I am trying to run the example packaged with the Giraph installat= ion - SimpleShortestPathsVertex. I haven't written any code myself yet - ju= st trying to get this to work first. I am not getting any memory exception = - no dump file is being generated at the DumpPath. $HADOOP_HOME/bin/hadoop jar ~/.local/bin/giraph-examples.jar org.apache.gir= aph.GiraphRunner -D giraph.logLevel=3D"all" -libjars ~/.local/bin/giraph-co= re.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache= .giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/vi= kesh/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextO= utputFormat -op /user/vikesh/shortestPaths8 -ca SimpleShortestPathsVertex.s= ource=3D2 -w 1 I am printing debug level logs now=2C and I am seeing these calls indefinit= ely in both the zookeeper and worker tasks - 2014-04-07 14:45:32=2C325 DEBU= G org.apache.hadoop.ipc.RPC: Call: statusUpdate 8=0A= 2014-04-07 14:45:35=2C326 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4= 7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #34=0A= 2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4= 7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #34= =0A= 2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 2=0A= 2014-04-07 14:45:38=2C328 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4= 7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #35=0A= 2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4= 7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #35= =0A= 2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 1=0A= 2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse= cs: Got timed signaled of false=0A= 2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse= cs: Wait for 0=0A= 2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse= cs: Got timed signaled of false=0A= 2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse= cs: Wait for 0These calls go on for 10 minutes and then the job is killed b= y Hadoop. Thanks=2CVikesh Khanna=2C Masters=2C Computer Science (Class of 2015) Stanford University From: "Lukas Nalezenec" To: user@giraph.apache.org Sent: Monday=2C April 7=2C 2014 4:13:23 AM Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobT= racker =0A= =0A= =0A= =0A= =0A= Hi=2C =0A= Try making and analyzing memory dump after exception (JVM=0A= param -XX:+HeapDumpOnOutOfMemoryError) =0A= What configuration (mainly Partition class) do you use ? =0A= Lukas =0A= =20 =0A= On 7.4.2014 11:45=2C Vikesh Khanna wrote: =0A= =0A= =0A= =0A= =0A= Hi=2C=0A= =20 =0A= =0A= Any ideas why Giraph waits indefinitely? I've been stuck on=0A= this for a long time now. =0A= =20 =0A= =0A= Thanks=2C=0A= Vikesh Khanna=2C =0A= Masters=2C Computer Science (Class of 2015) =0A= Stanford University =0A= =20 =0A= =0A= =20 =0A= =0A= =0A= From:=0A= "Vikesh Khanna" =0A= To: user@giraph.apache.org =0A= Sent: Friday=2C April 4=2C 2014 6:06:51 AM =0A= Subject: Re: Giraph job hangs indefinitely and is=0A= eventually killed by JobTracker =0A= =20 =0A= =0A= =0A= Hi Avery=2C =0A= =0A= =20 =0A= =0A= I tried both the options. It does appear to be a GC=0A= problem. The problem continues with the second option as=0A= well :(. I have attached the logs after enabling the first=0A= set of options and using 1 worker. Would be very helpful=0A= if you can take a look. =0A= =20 =0A= =0A= This machine has 1 TB memory. We ran benchmarks of=0A= various other graph libraries on this machine and they=0A= worked fine (even with graphs 10x larger than the Giraph=0A= PageRank benchmark - 40 million nodes). I am sure Giraph=0A= would work fine as well - this should not be a resource=0A= constraint. =0A= =20 =0A= =0A= Thanks=2C=0A= Vikesh Khanna=2C =0A= Masters=2C Computer Science (Class of 2015) =0A= Stanford University =0A= =20 =0A= =0A= =20 =0A= =0A= =0A= From:=0A= "Avery Ching" =0A= To: user@giraph.apache.org =0A= Sent: Thursday=2C April 3=2C 2014 7:26:56 PM =0A= Subject: Re: Giraph job hangs indefinitely and is=0A= eventually killed by JobTracker =0A= =20 =0A= =0A= This is for a single worker=0A= it appears. Most likely your worker went into GC and=0A= never returned. You can try with GC settings turned on=2C= =0A= try adding something like. =0A= =20 =0A= -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails=0A= -XX:+PrintGCTimeStamps -verbose:gc=20 =0A= =20 =0A= You could also try the concurrent mark/sweep collector. =0A= =20 =0A= =20 =0A= -XX:+UseConcMarkSweepGC =0A= =20 =0A= Any chance you can use more workers and/or get more=0A= memory? =0A= =20 =0A= Avery =0A= =20 =0A= On 4/3/14=2C 5:46 PM=2C Vikesh Khanna wrote: =0A= =0A= =0A= =0A= @Avery=2C =0A= =0A= =20 =0A= =0A= Thanks for the help. I checked out the task logs=2C=0A= and turns out there was an exception "GC overhead=0A= limit exceeded" due to which the benchmarks wouldn't=0A= even load the vertices. I got around it by=0A= increasing the heap size (mapred.child.java.opts) in=0A= mapred-site.xml. The benchmark is loading vertices=0A= now. However=2C the job is still getting stuck=0A= indefinitely (and eventually killed). I have=0A= attached the small log for the map task on 1 worker.=0A= Would really appreciate if you can help understand=0A= the cause. =0A= =20 =0A= =0A= Thanks=2C=0A= Vikesh Khanna=2C =0A= Masters=2C Computer Science (Class of 2015) =0A= Stanford University =0A= =20 =0A= =0A= =20 =0A= =0A= =0A= From:=0A= =0A= "Praveen kumar s.k" =0A= To: user@giraph.apache.org =0A= Sent: Thursday=2C April 3=2C 2014 4:40:07 PM =0A= Subject: Re: Giraph job hangs indefinitely=0A= and is eventually killed by JobTracker =0A= =20 =0A= =0A= You have given -w 30=2C make sure that that many=0A= number of map tasks are =0A= configured in your cluster =0A= =20 =0A= =0A= On Thu=2C Apr 3=2C 2014 at 6:24 PM=2C Avery Ching =0A= wrote: =0A= > My guess is that you don't get your resources.=0A= It would be very helpful to =0A= > print the master log. You can find it when the=0A= job is running to look at =0A= > the Hadoop counters on the job UI page. =0A= > =0A= > Avery =0A= > =0A= > =0A= > On 4/3/14=2C 12:49 PM=2C Vikesh Khanna wrote: =0A= > =0A= > Hi=2C =0A= > =0A= > I am running the PageRank benchmark under=0A= giraph-examples from giraph-1.0.0 =0A= > release. I am using the following command to=0A= run the job (as mentioned here) =0A= > =0A= > vikesh@madmax =0A= >=0A= /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apa= che/giraph/examples =0A= > $ $HADOOP_HOME/bin/hadoop jar =0A= >=0A= $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with= -dependencies.jar =0A= > org.apache.giraph.benchmark.PageRankBenchmark=0A= -e 1 -s 3 -v -V 50000000 -w 30 =0A= > =0A= > =0A= > However=2C the job gets stuck at map 9% and is=0A= eventually killed by the =0A= > JobTracker on reaching the mapred.task.timeout=0A= (default 10 minutes). I tried =0A= > increasing the timeout to a very large value=2C=0A= and the job went on for over 8 =0A= > hours without completion. I also tried the=0A= ShortestPathsBenchmark=2C which =0A= > also fails the same way. =0A= > =0A= > =0A= > Any help is appreciated. =0A= > =0A= > =0A= > ****** ---------------- *********** =0A= > =0A= > =0A= > Machine details: =0A= > =0A= > Linux version 2.6.32-279.14.1.el6.x86_64 =0A= > (mockbuild@c6b8.bsys.dev.centos.org)=0A= (gcc version 4.4.6 20120305 (Red Hat =0A= > 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC=0A= 2012 =0A= > =0A= > Architecture: x86_64 =0A= > CPU op-mode(s): 32-bit=2C 64-bit =0A= > Byte Order: Little Endian =0A= > CPU(s): 64 =0A= > On-line CPU(s) list: 0-63 =0A= > Thread(s) per core: 1 =0A= > Core(s) per socket: 8 =0A= > CPU socket(s): 8 =0A= > NUMA node(s): 8 =0A= > Vendor ID: GenuineIntel =0A= > CPU family: 6 =0A= > Model: 47 =0A= > Stepping: 2 =0A= > CPU MHz: 1064.000 =0A= > BogoMIPS: 5333.20 =0A= > Virtualization: VT-x =0A= > L1d cache: 32K =0A= > L1i cache: 32K =0A= > L2 cache: 256K =0A= > L3 cache: 24576K =0A= > NUMA node0 CPU(s): 1-8 =0A= > NUMA node1 CPU(s): 9-16 =0A= > NUMA node2 CPU(s): 17-24 =0A= > NUMA node3 CPU(s): 25-32 =0A= > NUMA node4 CPU(s): 0=2C33-39 =0A= > NUMA node5 CPU(s): 40-47 =0A= > NUMA node6 CPU(s): 48-55 =0A= > NUMA node7 CPU(s): 56-63 =0A= > =0A= > =0A= > I am using a pseudo-distributed Hadoop cluster=0A= on a single machine with =0A= > 64-cores. =0A= > =0A= > =0A= > *****-------------******* =0A= > =0A= > =0A= > Thanks=2C =0A= > Vikesh Khanna=2C =0A= > Masters=2C Computer Science (Class of 2015) =0A= > Stanford University =0A= > =0A= > =0A= > =0A= =0A= =20 =0A= =0A= =0A= =0A= =20 =0A= =0A= =20 =0A= =0A= =0A= =0A= =20 =0A= =0A= =0A= =0A= =20 =0A= =0A= =0A= = --_11dfd472-fb63-47e0-87cc-b2222b24cf68_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi Vikesh=2C

= It seems that you are trying to run benchmarks on giraph.
We had = a lot of improvements in 1.1.0-SNAPSHOT - (though it is not released public= ly in maven at Facebook we run all our applications on the snapshot version= )
So=2C you can pull the latest trunk from giraph: =3B
<= div>git clone https://git-wip-us.apache.org/repos/asf/giraph.git
And the= n try running some applications.

[you are correct= =2C we store hostnames-taskid mappings in the beginning of the run=2C so u = can see such failures]

Date: Mon=2C 7 Apr 2= 014 16:27:09 -0700
From: vikesh@stanford.edu
To: user@giraph.apache.o= rg
Subject: [Solved] Giraph job hangs indefinitely and is eventually kil= led by JobTracker

Hi=2C = =3B

Thanks for the help! Turns out this was ha= ppening because /etc/hosts had an outdated IP address (dynamic) for the hos= t that was being used as the master. Giraph was probably failing to communi= cate with the master throughout and getting stuck indefinitely.
<= span style=3D"font-size:12pt=3B">
Thanks=2C
Vikesh Khanna=2C
M= asters=2C Computer Science (Class of 2015)
Stanford University
=


From: "Vike= sh Khanna" <=3Bvikesh@stanford.edu>=3B
To: user@giraph.apache= .org
Sent: Monday=2C April 7=2C 2014 2:58:13 PM
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobTracker<= br>

@Pankaj=2C I am ru= nning the ShortestPath example on a tiny graph now (5 nodes). That is also = getting hung indefinitely the exact same way. This machine has 1 TB of memo= ry and I have used -Xmx25g (25 GB) =3B
as Java options. S= o hopefully it should not be because of memory limitation.  =3B[(free/total/max) =3D 1706.68M / 1979.75M / 2524= 2.25M]

=
@Lukas=2C I am trying to run the example packaged with the Girap= h installation - SimpleShortestPathsVertex. I haven't written any code myse= lf yet - just trying to get this to work first. I am not getting any memory= exception - no dump file is being generated at the DumpPath.
$HADOOP_HOME/bin/hadoop jar ~/.local/bin/giraph-examples.jar or= g.apache.giraph.GiraphRunner -D giraph.logLevel=3D"all" -libjars ~/.local/b= in/giraph-core.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vi= f org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -= vip /user/vikesh/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWi= thValueTextOutputFormat -op /user/vikesh/shortestPaths8 -ca SimpleShortestP= athsVertex.source=3D2 -w 1

I am print= ing debug level logs now=2C and I am seeing these calls indefinitely in bot= h the zookeeper and worker tasks - =3B
2014=
-04-07 14:45:32=2C325 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 8=
=0A=
2014-04-07 14:45:35=2C326 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #34=0A=
2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #34=
=0A=
2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 2=0A=
2014-04-07 14:45:38=2C328 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #35=0A=
2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #35=
=0A=
2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 1=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Got timed signaled of false=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Wait for 0=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Got timed signaled of false=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Wait for 0
These calls go on for 10 minute= s and then the job is killed by Hadoop.

<= /div>
Thanks=2C
Vikesh Khanna=2CMasters=2C Computer Science (Class of 2015)
Stanford University



From: "Lu= kas Nalezenec" <=3Blukas.nalezenec@firma.seznam.cz>=3B
To: us= er@giraph.apache.org
Sent: Monday=2C April 7=2C 2014 4:13:23 AMSubject: Re: Giraph job hangs indefinitely and is eventually kille= d by JobTracker

=0A= =0A= =0A= =0A= =0A=
Hi=2C
=0A= Try making and analyzing memory dump after exception (JVM=0A= param -XX:+HeapDumpOnOutOfMemoryError)
=0A= What configuration (mainly Partition class) do you use ?
=0A= Lukas
=0A=
=0A= On 7.4.2014 11:45=2C Vikesh Khanna wrote:
=0A=
=0A=
=0A= =0A=
=0A=
Hi=2C
=0A=

=0A=
=0A=
Any ideas why Giraph waits indefinitely? I've been stuck on=0A= this for a long time now. =3B
=0A=

=0A=
=0A=
Thanks=2C
=0A=
Vikesh Khanna=2C
=0A= Masters=2C Computer Science (Class of 2015)
=0A= Stanford University
=0A=
=0A=
=0A=

=0A=
=0A=
=0A=
From:=0A= "Vikesh Khanna" <=3Bvikesh@stanford.edu>= =3B
=0A= To: user@giraph.apache.org
= =0A= Sent: Friday=2C April 4=2C 2014 6:06:51 AM
=0A= Subject: Re: Giraph job hangs indefinitely and is=0A= eventually killed by JobTracker
=0A=

=0A=
=0A=
=0A=
Hi Avery=2C
=0A=
=0A=

=0A=
=0A=
I tried both the options. It does appear to be a GC=0A= problem. The problem continues with the second option as=0A= well :(. I have attached the logs after enabling the first=0A= set of options and using 1 worker. Would be very helpful=0A= if you can take a look. =3B
=0A=

=0A=
=0A=
This machine has 1 TB memory. We ran benchmarks of=0A= various other graph libraries on this machine and they=0A= worked fine (even with graphs 10x larger than the Giraph=0A= PageRank benchmark - 40 million nodes). I am sure Giraph=0A= would work fine as well - this should not be a resource=0A= constraint.  =3B
=0A=

=0A=
=0A=
Thanks=2C
=0A=
Vikesh Khanna=2C
=0A= Masters=2C Computer Science (Class of 2015)
=0A= Stanford University
=0A=
=0A=
=0A=

=0A=
=0A=
=0A=
From:=0A= "Avery Ching" <=3Baching@apache.org>= =3B
=0A= To: user@giraph.apache.org<= br>=0A= Sent: Thursday=2C April 3=2C 2014 7:26:56 PM
=0A= Subject: Re: Giraph job hangs indefinitely and is=0A= eventually killed by JobTracker
=0A=

=0A=
=0A=
This is for a single worker= =0A= it appears. =3B Most likely your worker went into GC an= d=0A= never returned. =3B You can try with GC settings turned= on=2C=0A= try adding something like.
=0A=
=0A= -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails=0A= -XX:+PrintGCTimeStamps -verbose:gc
=0A=
=0A= You could also try the concurrent mark/sweep collector.&nbs= p=3B=0A=
=0A=
=0A= -XX:+UseConcMarkSweepGC
=0A=
=0A= Any chance you can use more workers and/or get more=0A= memory?
=0A=
=0A= Avery
=0A=
=0A= On 4/3/14=2C 5:46 PM=2C Vikesh Khanna wrote:
=0A=
=0A=
=0A=
=0A=
@Avery=2C
=0A=
=0A=

=0A=
=0A=
Thanks for the help. I checked out the task logs=2C= =0A= and turns out there was an exception  =3B"GC overhe= ad=0A= limit exceeded" due to which the benchmarks wouldn't=0A= even load the vertices. I got around it by=0A= increasing the heap size (mapred.child.java.opts) in=0A= mapred-site.xml. The benchmark is loading vertices=0A= now. However=2C the job is still getting stuck=0A= indefinitely (and eventually killed). I have=0A= attached the small log for the map task on 1 worker.=0A= Would really appreciate if you can help understand=0A= the cause. =3B
=0A=

=0A=
=0A=
Thanks=2C
=0A=
Vikesh Khanna=2C
=0A= Masters=2C Computer Science (Class of 2015)
=0A= Stanford University
=0A=
=0A=
=0A=

=0A=
=0A=
=0A=
From:=0A= =0A= "Praveen kumar s.k" <=3Bsk= praveenkumar9@gmail.com>=3B
=0A= To: user@giraph.apache.or= g
=0A= Sent: Thursday=2C April 3=2C 2014 4:40:07 PM
= =0A= Subject: Re: Giraph job hangs indefinitely=0A= and is eventually killed by JobTracker
=0A=

=0A=
=0A= You have given -w 30=2C make sure that that many=0A= number of map tasks are
=0A= configured in your cluster
=0A=

=0A=
=0A= On Thu=2C Apr 3=2C 2014 at 6:24 PM=2C Avery Ching <=3Baching@apache.org>=3B=0A= wrote:
=0A= >=3B My guess is that you don't get your resources.= =0A=  =3BIt would be very helpful to
=0A= >=3B print the master log.  =3BYou can find it wh= en the=0A= job is running to look at
=0A= >=3B the Hadoop counters on the job UI page.
=0A= >=3B
=0A= >=3B Avery
=0A= >=3B
=0A= >=3B
=0A= >=3B On 4/3/14=2C 12:49 PM=2C Vikesh Khanna wrote:=0A= >=3B
=0A= >=3B Hi=2C
=0A= >=3B
=0A= >=3B I am running the PageRank benchmark under=0A= giraph-examples from giraph-1.0.0
=0A= >=3B release. I am using the following command to=0A= run the job (as mentioned here)
=0A= >=3B
=0A= >=3B vikesh@madmax
=0A= >=3B=0A= /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apa= che/giraph/examples
=0A= >=3B $ $HADOOP_HOME/bin/hadoop jar
=0A= >=3B=0A= $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with= -dependencies.jar
=0A= >=3B org.apache.giraph.benchmark.PageRankBenchmark=0A= -e 1 -s 3 -v -V 50000000 -w 30
=0A= >=3B
=0A= >=3B
=0A= >=3B However=2C the job gets stuck at map 9% and is= =0A= eventually killed by the
=0A= >=3B JobTracker on reaching the mapred.task.timeout= =0A= (default 10 minutes). I tried
=0A= >=3B increasing the timeout to a very large value=2C= =0A= and the job went on for over 8
=0A= >=3B hours without completion. I also tried the=0A= ShortestPathsBenchmark=2C which
=0A= >=3B also fails the same way.
=0A= >=3B
=0A= >=3B
=0A= >=3B Any help is appreciated.
=0A= >=3B
=0A= >=3B
=0A= >=3B ****** ---------------- ***********
=0A= >=3B
=0A= >=3B
=0A= >=3B Machine details:
=0A= >=3B
=0A= >=3B Linux version 2.6.32-279.14.1.el6.x86_64
=0A= >=3B (mockbuild@= c6b8.bsys.dev.centos.org)=0A= (gcc version 4.4.6 20120305 (Red Hat
=0A= >=3B 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC= =0A= 2012
=0A= >=3B
=0A= >=3B Architecture: x86_64
=0A= >=3B CPU op-mode(s): 32-bit=2C 64-bit
=0A= >=3B Byte Order: Little Endian
=0A= >=3B CPU(s): 64
=0A= >=3B On-line CPU(s) list: 0-63
=0A= >=3B Thread(s) per core: 1
=0A= >=3B Core(s) per socket: 8
=0A= >=3B CPU socket(s): 8
=0A= >=3B NUMA node(s): 8
=0A= >=3B Vendor ID: GenuineIntel
=0A= >=3B CPU family: 6
=0A= >=3B Model: 47
=0A= >=3B Stepping: 2
=0A= >=3B CPU MHz: 1064.000
=0A= >=3B BogoMIPS: 5333.20
=0A= >=3B Virtualization: VT-x
=0A= >=3B L1d cache: 32K
=0A= >=3B L1i cache: 32K
=0A= >=3B L2 cache: 256K
=0A= >=3B L3 cache: 24576K
=0A= >=3B NUMA node0 CPU(s): 1-8
=0A= >=3B NUMA node1 CPU(s): 9-16
=0A= >=3B NUMA node2 CPU(s): 17-24
=0A= >=3B NUMA node3 CPU(s): 25-32
=0A= >=3B NUMA node4 CPU(s): 0=2C33-39
=0A= >=3B NUMA node5 CPU(s): 40-47
=0A= >=3B NUMA node6 CPU(s): 48-55
=0A= >=3B NUMA node7 CPU(s): 56-63
=0A= >=3B
=0A= >=3B
=0A= >=3B I am using a pseudo-distributed Hadoop cluster= =0A= on a single machine with
=0A= >=3B 64-cores.
=0A= >=3B
=0A= >=3B
=0A= >=3B *****-------------*******
=0A= >=3B
=0A= >=3B
=0A= >=3B Thanks=2C
=0A= >=3B Vikesh Khanna=2C
=0A= >=3B Masters=2C Computer Science (Class of 2015)
= =0A= >=3B Stanford University
=0A= >=3B
=0A= >=3B
=0A= >=3B
=0A=
=0A=

=0A=
=0A=
=0A=
=0A=
=0A=
=0A=

=0A=
=0A=
=0A=
=0A=

=0A=
=0A=
=0A=
=0A=
=0A= =0A= =0A=


=
= --_11dfd472-fb63-47e0-87cc-b2222b24cf68_--