giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Nalezenec <lukas.naleze...@firma.seznam.cz>
Subject Re: Giraph job hangs indefinitely and is eventually killed by JobTracker
Date Mon, 07 Apr 2014 11:13:23 GMT
Hi,
|Try making and analyzing memory dump after exception (JVM param 
-XX:+HeapDumpOnOutOfMemoryError|)
What configuration (mainly Partition class) do you use ?
Lukas

On 7.4.2014 11:45, Vikesh Khanna wrote:
> Hi,
>
> Any ideas why Giraph waits indefinitely? I've been stuck on this for a 
> long time now.
>
> Thanks,
> Vikesh Khanna,
> Masters, Computer Science (Class of 2015)
> Stanford University
>
>
> ------------------------------------------------------------------------
> *From: *"Vikesh Khanna" <vikesh@stanford.edu>
> *To: *user@giraph.apache.org
> *Sent: *Friday, April 4, 2014 6:06:51 AM
> *Subject: *Re: Giraph job hangs indefinitely and is eventually killed 
> by JobTracker
>
> Hi Avery,
>
> I tried both the options. It does appear to be a GC problem. The 
> problem continues with the second option as well :(. I have attached 
> the logs after enabling the first set of options and using 1 worker. 
> Would be very helpful if you can take a look.
>
> This machine has 1 TB memory. We ran benchmarks of various other graph 
> libraries on this machine and they worked fine (even with graphs 10x 
> larger than the Giraph PageRank benchmark - 40 million nodes). I am 
> sure Giraph would work fine as well - this should not be a resource 
> constraint.
>
> Thanks,
> Vikesh Khanna,
> Masters, Computer Science (Class of 2015)
> Stanford University
>
>
> ------------------------------------------------------------------------
> *From: *"Avery Ching" <aching@apache.org>
> *To: *user@giraph.apache.org
> *Sent: *Thursday, April 3, 2014 7:26:56 PM
> *Subject: *Re: Giraph job hangs indefinitely and is eventually killed 
> by JobTracker
>
> This is for a single worker it appears.  Most likely your worker went 
> into GC and never returned.  You can try with GC settings turned on, 
> try adding something like.
>
> -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -verbose:gc
>
> You could also try the concurrent mark/sweep collector.
>
> -XX:+UseConcMarkSweepGC
>
> Any chance you can use more workers and/or get more memory?
>
> Avery
>
> On 4/3/14, 5:46 PM, Vikesh Khanna wrote:
>
>     @Avery,
>
>     Thanks for the help. I checked out the task logs, and turns out
>     there was an exception  "GC overhead limit exceeded" due to which
>     the benchmarks wouldn't even load the vertices. I got around it by
>     increasing the heap size (mapred.child.java.opts) in
>     mapred-site.xml. The benchmark is loading vertices now. However,
>     the job is still getting stuck indefinitely (and eventually
>     killed). I have attached the small log for the map task on 1
>     worker. Would really appreciate if you can help understand the cause.
>
>     Thanks,
>     Vikesh Khanna,
>     Masters, Computer Science (Class of 2015)
>     Stanford University
>
>
>     ------------------------------------------------------------------------
>     *From: *"Praveen kumar s.k" <skpraveenkumar9@gmail.com>
>     *To: *user@giraph.apache.org
>     *Sent: *Thursday, April 3, 2014 4:40:07 PM
>     *Subject: *Re: Giraph job hangs indefinitely and is eventually
>     killed by JobTracker
>
>     You have given -w 30, make sure that that many number of map tasks are
>     configured in your cluster
>
>     On Thu, Apr 3, 2014 at 6:24 PM, Avery Ching <aching@apache.org> wrote:
>     > My guess is that you don't get your resources.  It would be very
>     helpful to
>     > print the master log.  You can find it when the job is running
>     to look at
>     > the Hadoop counters on the job UI page.
>     >
>     > Avery
>     >
>     >
>     > On 4/3/14, 12:49 PM, Vikesh Khanna wrote:
>     >
>     > Hi,
>     >
>     > I am running the PageRank benchmark under giraph-examples from
>     giraph-1.0.0
>     > release. I am using the following command to run the job (as
>     mentioned here)
>     >
>     > vikesh@madmax
>     >
>     /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples
>     > $ $HADOOP_HOME/bin/hadoop jar
>     >
>     $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>     > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V
>     50000000 -w 30
>     >
>     >
>     > However, the job gets stuck at map 9% and is eventually killed
>     by the
>     > JobTracker on reaching the mapred.task.timeout (default 10
>     minutes). I tried
>     > increasing the timeout to a very large value, and the job went
>     on for over 8
>     > hours without completion. I also tried the
>     ShortestPathsBenchmark, which
>     > also fails the same way.
>     >
>     >
>     > Any help is appreciated.
>     >
>     >
>     > ****** ---------------- ***********
>     >
>     >
>     > Machine details:
>     >
>     > Linux version 2.6.32-279.14.1.el6.x86_64
>     > (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305
>     (Red Hat
>     > 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012
>     >
>     > Architecture: x86_64
>     > CPU op-mode(s): 32-bit, 64-bit
>     > Byte Order: Little Endian
>     > CPU(s): 64
>     > On-line CPU(s) list: 0-63
>     > Thread(s) per core: 1
>     > Core(s) per socket: 8
>     > CPU socket(s): 8
>     > NUMA node(s): 8
>     > Vendor ID: GenuineIntel
>     > CPU family: 6
>     > Model: 47
>     > Stepping: 2
>     > CPU MHz: 1064.000
>     > BogoMIPS: 5333.20
>     > Virtualization: VT-x
>     > L1d cache: 32K
>     > L1i cache: 32K
>     > L2 cache: 256K
>     > L3 cache: 24576K
>     > NUMA node0 CPU(s): 1-8
>     > NUMA node1 CPU(s): 9-16
>     > NUMA node2 CPU(s): 17-24
>     > NUMA node3 CPU(s): 25-32
>     > NUMA node4 CPU(s): 0,33-39
>     > NUMA node5 CPU(s): 40-47
>     > NUMA node6 CPU(s): 48-55
>     > NUMA node7 CPU(s): 56-63
>     >
>     >
>     > I am using a pseudo-distributed Hadoop cluster on a single
>     machine with
>     > 64-cores.
>     >
>     >
>     > *****-------------*******
>     >
>     >
>     > Thanks,
>     > Vikesh Khanna,
>     > Masters, Computer Science (Class of 2015)
>     > Stanford University
>     >
>     >
>     >
>
>
>
>


Mime
View raw message