giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Giraph job hangs indefinitely and is eventually killed by JobTracker
Date Fri, 04 Apr 2014 02:26:56 GMT
This is for a single worker it appears.  Most likely your worker went 
into GC and never returned.  You can try with GC settings turned on, try 
adding something like.

-XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -verbose:gc

You could also try the concurrent mark/sweep collector.

-XX:+UseConcMarkSweepGC

Any chance you can use more workers and/or get more memory?

Avery

On 4/3/14, 5:46 PM, Vikesh Khanna wrote:
> @Avery,
>
> Thanks for the help. I checked out the task logs, and turns out there 
> was an exception  "GC overhead limit exceeded" due to which the 
> benchmarks wouldn't even load the vertices. I got around it by 
> increasing the heap size (mapred.child.java.opts) in mapred-site.xml. 
> The benchmark is loading vertices now. However, the job is still 
> getting stuck indefinitely (and eventually killed). I have attached 
> the small log for the map task on 1 worker. Would really appreciate if 
> you can help understand the cause.
>
> Thanks,
> Vikesh Khanna,
> Masters, Computer Science (Class of 2015)
> Stanford University
>
>
> ------------------------------------------------------------------------
> *From: *"Praveen kumar s.k" <skpraveenkumar9@gmail.com>
> *To: *user@giraph.apache.org
> *Sent: *Thursday, April 3, 2014 4:40:07 PM
> *Subject: *Re: Giraph job hangs indefinitely and is eventually killed 
> by JobTracker
>
> You have given -w 30, make sure that that many number of map tasks are
> configured in your cluster
>
> On Thu, Apr 3, 2014 at 6:24 PM, Avery Ching <aching@apache.org> wrote:
> > My guess is that you don't get your resources.  It would be very 
> helpful to
> > print the master log.  You can find it when the job is running to 
> look at
> > the Hadoop counters on the job UI page.
> >
> > Avery
> >
> >
> > On 4/3/14, 12:49 PM, Vikesh Khanna wrote:
> >
> > Hi,
> >
> > I am running the PageRank benchmark under giraph-examples from 
> giraph-1.0.0
> > release. I am using the following command to run the job (as 
> mentioned here)
> >
> > vikesh@madmax
> > 
> /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples
> > $ $HADOOP_HOME/bin/hadoop jar
> > 
> $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
> > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 
> 50000000 -w 30
> >
> >
> > However, the job gets stuck at map 9% and is eventually killed by the
> > JobTracker on reaching the mapred.task.timeout (default 10 minutes). 
> I tried
> > increasing the timeout to a very large value, and the job went on 
> for over 8
> > hours without completion. I also tried the ShortestPathsBenchmark, which
> > also fails the same way.
> >
> >
> > Any help is appreciated.
> >
> >
> > ****** ---------------- ***********
> >
> >
> > Machine details:
> >
> > Linux version 2.6.32-279.14.1.el6.x86_64
> > (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305 
> (Red Hat
> > 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012
> >
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 64
> > On-line CPU(s) list: 0-63
> > Thread(s) per core: 1
> > Core(s) per socket: 8
> > CPU socket(s): 8
> > NUMA node(s): 8
> > Vendor ID: GenuineIntel
> > CPU family: 6
> > Model: 47
> > Stepping: 2
> > CPU MHz: 1064.000
> > BogoMIPS: 5333.20
> > Virtualization: VT-x
> > L1d cache: 32K
> > L1i cache: 32K
> > L2 cache: 256K
> > L3 cache: 24576K
> > NUMA node0 CPU(s): 1-8
> > NUMA node1 CPU(s): 9-16
> > NUMA node2 CPU(s): 17-24
> > NUMA node3 CPU(s): 25-32
> > NUMA node4 CPU(s): 0,33-39
> > NUMA node5 CPU(s): 40-47
> > NUMA node6 CPU(s): 48-55
> > NUMA node7 CPU(s): 56-63
> >
> >
> > I am using a pseudo-distributed Hadoop cluster on a single machine with
> > 64-cores.
> >
> >
> > *****-------------*******
> >
> >
> > Thanks,
> > Vikesh Khanna,
> > Masters, Computer Science (Class of 2015)
> > Stanford University
> >
> >
> >
>


Mime
View raw message