giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikesh Khanna <vik...@stanford.edu>
Subject [Solved] Giraph job hangs indefinitely and is eventually killed by JobTracker
Date Mon, 07 Apr 2014 23:27:09 GMT
Hi, 

Thanks for the help! Turns out this was happening because /etc/hosts had an outdated IP address
(dynamic) for the host that was being used as the master. Giraph was probably failing to communicate
with the master throughout and getting stuck indefinitely. 

Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Vikesh Khanna" <vikesh@stanford.edu> 
To: user@giraph.apache.org 
Sent: Monday, April 7, 2014 2:58:13 PM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobTracker 

@Pankaj, I am running the ShortestPath example on a tiny graph now (5 nodes). That is also
getting hung indefinitely the exact same way. This machine has 1 TB of memory and I have used
-Xmx25g (25 GB) 
as Java options. So hopefully it should not be because of memory limitation. [ (free/total/max)
= 1706.68M / 1979.75M / 25242.25M ] 

@Lukas, I am trying to run the example packaged with the Giraph installation - SimpleShortestPathsVertex.
I haven't written any code myself yet - just trying to get this to work first. I am not getting
any memory exception - no dump file is being generated at the DumpPath. 

$HADOOP_HOME/bin/hadoop jar ~/.local/bin/giraph-examples.jar org.apache.giraph.GiraphRunner
-D giraph.logLevel="all" -libjars ~/.local/bin/giraph-core.jar org.apache.giraph.examples.SimpleShortestPathsVertex
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/vikesh/input/tiny_graph.txt
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/vikesh/shortestPaths8
-ca SimpleShortestPathsVertex.source=2 -w 1 

I am printing debug level logs now, and I am seeing these calls indefinitely in both the zookeeper
and worker tasks - 
2014-04-07 14:45:32,325 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 8
2014-04-07 14:45:35,326 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to
/127.0.0.1:45894 from job_201404071443_0001 sending #34
2014-04-07 14:45:35,327 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to
/127.0.0.1:45894 from job_201404071443_0001 got value #34
2014-04-07 14:45:35,327 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 2
2014-04-07 14:45:38,328 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to
/127.0.0.1:45894 from job_201404071443_0001 sending #35
2014-04-07 14:45:38,329 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to
/127.0.0.1:45894 from job_201404071443_0001 got value #35
2014-04-07 14:45:38,329 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 1
2014-04-07 14:45:38,910 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Got timed signaled
of false
2014-04-07 14:45:38,910 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Wait for 0
2014-04-07 14:45:38,910 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Got timed signaled
of false
2014-04-07 14:45:38,910 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Wait for 0 
These calls go on for 10 minutes and then the job is killed by Hadoop. 

Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Lukas Nalezenec" <lukas.nalezenec@firma.seznam.cz> 
To: user@giraph.apache.org 
Sent: Monday, April 7, 2014 4:13:23 AM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobTracker 

Hi, 
Try making and analyzing memory dump after exception (JVM param -XX:+HeapDumpOnOutOfMemoryError
) 
What configuration (mainly Partition class) do you use ? 
Lukas 

On 7.4.2014 11:45, Vikesh Khanna wrote: 



Hi, 

Any ideas why Giraph waits indefinitely? I've been stuck on this for a long time now. 

Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Vikesh Khanna" <vikesh@stanford.edu> 
To: user@giraph.apache.org 
Sent: Friday, April 4, 2014 6:06:51 AM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobTracker 

Hi Avery, 

I tried both the options. It does appear to be a GC problem. The problem continues with the
second option as well :(. I have attached the logs after enabling the first set of options
and using 1 worker. Would be very helpful if you can take a look. 

This machine has 1 TB memory. We ran benchmarks of various other graph libraries on this machine
and they worked fine (even with graphs 10x larger than the Giraph PageRank benchmark - 40
million nodes). I am sure Giraph would work fine as well - this should not be a resource constraint.


Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Avery Ching" <aching@apache.org> 
To: user@giraph.apache.org 
Sent: Thursday, April 3, 2014 7:26:56 PM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobTracker 

This is for a single worker it appears. Most likely your worker went into GC and never returned.
You can try with GC settings turned on, try adding something like. 

-XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc


You could also try the concurrent mark/sweep collector. 

-XX:+UseConcMarkSweepGC 

Any chance you can use more workers and/or get more memory? 

Avery 

On 4/3/14, 5:46 PM, Vikesh Khanna wrote: 

<blockquote>

@Avery, 

Thanks for the help. I checked out the task logs, and turns out there was an exception "GC
overhead limit exceeded" due to which the benchmarks wouldn't even load the vertices. I got
around it by increasing the heap size (mapred.child.java.opts) in mapred-site.xml. The benchmark
is loading vertices now. However, the job is still getting stuck indefinitely (and eventually
killed). I have attached the small log for the map task on 1 worker. Would really appreciate
if you can help understand the cause. 

Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Praveen kumar s.k" <skpraveenkumar9@gmail.com> 
To: user@giraph.apache.org 
Sent: Thursday, April 3, 2014 4:40:07 PM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobTracker 

You have given -w 30, make sure that that many number of map tasks are 
configured in your cluster 

On Thu, Apr 3, 2014 at 6:24 PM, Avery Ching <aching@apache.org> wrote: 
> My guess is that you don't get your resources. It would be very helpful to 
> print the master log. You can find it when the job is running to look at 
> the Hadoop counters on the job UI page. 
> 
> Avery 
> 
> 
> On 4/3/14, 12:49 PM, Vikesh Khanna wrote: 
> 
> Hi, 
> 
> I am running the PageRank benchmark under giraph-examples from giraph-1.0.0 
> release. I am using the following command to run the job (as mentioned here) 
> 
> vikesh@madmax 
> /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples

> $ $HADOOP_HOME/bin/hadoop jar 
> $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar

> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 -w 30 
> 
> 
> However, the job gets stuck at map 9% and is eventually killed by the 
> JobTracker on reaching the mapred.task.timeout (default 10 minutes). I tried 
> increasing the timeout to a very large value, and the job went on for over 8 
> hours without completion. I also tried the ShortestPathsBenchmark, which 
> also fails the same way. 
> 
> 
> Any help is appreciated. 
> 
> 
> ****** ---------------- *********** 
> 
> 
> Machine details: 
> 
> Linux version 2.6.32-279.14.1.el6.x86_64 
> ( mockbuild@c6b8.bsys.dev.centos.org ) (gcc version 4.4.6 20120305 (Red Hat 
> 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012 
> 
> Architecture: x86_64 
> CPU op-mode(s): 32-bit, 64-bit 
> Byte Order: Little Endian 
> CPU(s): 64 
> On-line CPU(s) list: 0-63 
> Thread(s) per core: 1 
> Core(s) per socket: 8 
> CPU socket(s): 8 
> NUMA node(s): 8 
> Vendor ID: GenuineIntel 
> CPU family: 6 
> Model: 47 
> Stepping: 2 
> CPU MHz: 1064.000 
> BogoMIPS: 5333.20 
> Virtualization: VT-x 
> L1d cache: 32K 
> L1i cache: 32K 
> L2 cache: 256K 
> L3 cache: 24576K 
> NUMA node0 CPU(s): 1-8 
> NUMA node1 CPU(s): 9-16 
> NUMA node2 CPU(s): 17-24 
> NUMA node3 CPU(s): 25-32 
> NUMA node4 CPU(s): 0,33-39 
> NUMA node5 CPU(s): 40-47 
> NUMA node6 CPU(s): 48-55 
> NUMA node7 CPU(s): 56-63 
> 
> 
> I am using a pseudo-distributed Hadoop cluster on a single machine with 
> 64-cores. 
> 
> 
> *****-------------******* 
> 
> 
> Thanks, 
> Vikesh Khanna, 
> Masters, Computer Science (Class of 2015) 
> Stanford University 
> 
> 
> 







</blockquote>




Mime
View raw message