Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Received-SPF: pass (nike.apache.org: domain of pavanka@outlook.com designates
 65.54.190.223 as permitted sender)
Message-ID: <BAY176-W11B1C15E3347ECC6840DFABE6B0@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_11dfd472-fb63-47e0-87cc-b2222b24cf68_"
From: Pavan Kumar A <pavanka@outlook.com>
To: "user@giraph.apache.org" <user@giraph.apache.org>
Subject: RE: [Solved] Giraph job hangs indefinitely and is eventually killed
 by JobTracker
Date: Tue, 8 Apr 2014 08:26:04 +0530
Importance: Normal
In-Reply-To: <1998640635.921354.1396913229980.JavaMail.zimbra@stanford.edu>
References: <43542824.4019222.1396554548120.JavaMail.zimbra@stanford.edu>
 <CAPidEUwfswoz8CHCDOhV_Xk4xMc1GqD2-QC9wjp1fQGHzb=8Zg@mail.gmail.com>
 <1696746208.4389778.1396572398582.JavaMail.zimbra@stanford.edu>
 <533E1870.1070801@apache.org>
 <356765409.4601252.1396616811169.JavaMail.zimbra@stanford.edu>
 <1904550749.28380.1396863944166.JavaMail.zimbra@stanford.edu>
 <53428853.6070209@firma.seznam.cz>
 <824427340.772045.1396907893323.JavaMail.zimbra@stanford.edu>,<1998640635.921354.1396913229980.JavaMail.zimbra@stanford.edu>
MIME-Version: 1.0

--_11dfd472-fb63-47e0-87cc-b2222b24cf68_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi Vikesh=2C
It seems that you are trying to run benchmarks on giraph.We had a lot of im=
provements in 1.1.0-SNAPSHOT - (though it is not released publicly in maven=
 at Facebook we run all our applications on the snapshot version)So=2C you =
can pull the latest trunk from giraph: git clone https://git-wip-us.apache.=
org/repos/asf/giraph.git
And then try running some applications.
[you are correct=2C we store hostnames-taskid mappings in the beginning of =
the run=2C so u can see such failures]
Date: Mon=2C 7 Apr 2014 16:27:09 -0700
From: vikesh@stanford.edu
To: user@giraph.apache.org
Subject: [Solved] Giraph job hangs indefinitely and is eventually killed by=
 JobTracker

Hi=2C=20

Thanks for the help! Turns out this was happening because /etc/hosts had an=
 outdated IP address (dynamic) for the host that was being used as the mast=
er. Giraph was probably failing to communicate with the master throughout a=
nd getting stuck indefinitely.
Thanks=2CVikesh Khanna=2C
Masters=2C Computer Science (Class of 2015)
Stanford University


From: "Vikesh Khanna" <vikesh@stanford.edu>
To: user@giraph.apache.org
Sent: Monday=2C April 7=2C 2014 2:58:13 PM
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobT=
racker

@Pankaj=2C I am running the ShortestPath example on a tiny graph now (5 nod=
es). That is also getting hung indefinitely the exact same way. This machin=
e has 1 TB of memory and I have used -Xmx25g (25 GB)=20
as Java options. So hopefully it should not be because of memory limitation=
.  [(free/total/max) =3D 1706.68M / 1979.75M / 25242.25M]

@Lukas=2C I am trying to run the example packaged with the Giraph installat=
ion - SimpleShortestPathsVertex. I haven't written any code myself yet - ju=
st trying to get this to work first. I am not getting any memory exception =
- no dump file is being generated at the DumpPath.
$HADOOP_HOME/bin/hadoop jar ~/.local/bin/giraph-examples.jar org.apache.gir=
aph.GiraphRunner -D giraph.logLevel=3D"all" -libjars ~/.local/bin/giraph-co=
re.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache=
.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/vi=
kesh/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextO=
utputFormat -op /user/vikesh/shortestPaths8 -ca SimpleShortestPathsVertex.s=
ource=3D2 -w 1
I am printing debug level logs now=2C and I am seeing these calls indefinit=
ely in both the zookeeper and worker tasks - 2014-04-07 14:45:32=2C325 DEBU=
G org.apache.hadoop.ipc.RPC: Call: statusUpdate 8=0A=
2014-04-07 14:45:35=2C326 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #34=0A=
2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #34=
=0A=
2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 2=0A=
2014-04-07 14:45:38=2C328 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #35=0A=
2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #35=
=0A=
2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 1=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Got timed signaled of false=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Wait for 0=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Got timed signaled of false=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Wait for 0These calls go on for 10 minutes and then the job is killed b=
y Hadoop.
Thanks=2CVikesh Khanna=2C
Masters=2C Computer Science (Class of 2015)
Stanford University


From: "Lukas Nalezenec" <lukas.nalezenec@firma.seznam.cz>
To: user@giraph.apache.org
Sent: Monday=2C April 7=2C 2014 4:13:23 AM
Subject: Re: Giraph job hangs indefinitely and is eventually killed by JobT=
racker

=0A=
  =0A=
    =0A=
  =0A=
  =0A=
    Hi=2C
=0A=
      Try making and analyzing memory dump after exception (JVM=0A=
        param -XX:+HeapDumpOnOutOfMemoryError)
=0A=
      What configuration (mainly Partition class) do you use ?
=0A=
      Lukas
=0A=
     =20
=0A=
      On 7.4.2014 11:45=2C Vikesh Khanna wrote:
=0A=
    =0A=
    =0A=
      =0A=
      =0A=
        Hi=2C=0A=
       =20
=0A=
        =0A=
        Any ideas why Giraph waits indefinitely? I've been stuck on=0A=
          this for a long time now. =0A=
       =20
=0A=
        =0A=
        Thanks=2C=0A=
        Vikesh Khanna=2C
=0A=
          Masters=2C Computer Science (Class of 2015)
=0A=
          Stanford University
=0A=
         =20
=0A=
        =0A=
       =20
=0A=
        =0A=
        =0A=
        From:=0A=
          "Vikesh Khanna" <vikesh@stanford.edu>
=0A=
          To: user@giraph.apache.org
=0A=
          Sent: Friday=2C April 4=2C 2014 6:06:51 AM
=0A=
          Subject: Re: Giraph job hangs indefinitely and is=0A=
          eventually killed by JobTracker
=0A=
         =20
=0A=
          =0A=
          =0A=
            Hi Avery=2C
=0A=
            =0A=
           =20
=0A=
            =0A=
            I tried both the options. It does appear to be a GC=0A=
              problem. The problem continues with the second option as=0A=
              well :(. I have attached the logs after enabling the first=0A=
              set of options and using 1 worker. Would be very helpful=0A=
              if you can take a look. =0A=
           =20
=0A=
            =0A=
            This machine has 1 TB memory. We ran benchmarks of=0A=
              various other graph libraries on this machine and they=0A=
              worked fine (even with graphs 10x larger than the Giraph=0A=
              PageRank benchmark - 40 million nodes). I am sure Giraph=0A=
              would work fine as well - this should not be a resource=0A=
              constraint.  =0A=
           =20
=0A=
            =0A=
            Thanks=2C=0A=
            Vikesh Khanna=2C
=0A=
              Masters=2C Computer Science (Class of 2015)
=0A=
              Stanford University
=0A=
             =20
=0A=
            =0A=
           =20
=0A=
            =0A=
            =0A=
            From:=0A=
              "Avery Ching" <aching@apache.org>
=0A=
              To: user@giraph.apache.org
=0A=
              Sent: Thursday=2C April 3=2C 2014 7:26:56 PM
=0A=
              Subject: Re: Giraph job hangs indefinitely and is=0A=
              eventually killed by JobTracker
=0A=
             =20
=0A=
              =0A=
              This is for a single worker=0A=
                it appears.  Most likely your worker went into GC and=0A=
                never returned.  You can try with GC settings turned on=2C=
=0A=
                try adding something like.
=0A=
               =20
=0A=
                -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails=0A=
                -XX:+PrintGCTimeStamps -verbose:gc=20
=0A=
               =20
=0A=
                You could also try the concurrent mark/sweep collector. =0A=
               =20
=0A=
               =20
=0A=
                -XX:+UseConcMarkSweepGC
=0A=
               =20
=0A=
                Any chance you can use more workers and/or get more=0A=
                memory?
=0A=
               =20
=0A=
                Avery
=0A=
               =20
=0A=
                On 4/3/14=2C 5:46 PM=2C Vikesh Khanna wrote:
=0A=
              =0A=
              =0A=
                =0A=
                  @Avery=2C
=0A=
                  =0A=
                 =20
=0A=
                  =0A=
                  Thanks for the help. I checked out the task logs=2C=0A=
                    and turns out there was an exception  "GC overhead=0A=
                    limit exceeded" due to which the benchmarks wouldn't=0A=
                    even load the vertices. I got around it by=0A=
                    increasing the heap size (mapred.child.java.opts) in=0A=
                    mapred-site.xml. The benchmark is loading vertices=0A=
                    now. However=2C the job is still getting stuck=0A=
                    indefinitely (and eventually killed). I have=0A=
                    attached the small log for the map task on 1 worker.=0A=
                    Would really appreciate if you can help understand=0A=
                    the cause. =0A=
                 =20
=0A=
                  =0A=
                  Thanks=2C=0A=
                  Vikesh Khanna=2C
=0A=
                    Masters=2C Computer Science (Class of 2015)
=0A=
                    Stanford University
=0A=
                   =20
=0A=
                  =0A=
                 =20
=0A=
                  =0A=
                  =0A=
                  From:=0A=
=0A=
                    "Praveen kumar s.k" <skpraveenkumar9@gmail.com>
=0A=
                    To: user@giraph.apache.org
=0A=
                    Sent: Thursday=2C April 3=2C 2014 4:40:07 PM
=0A=
                    Subject: Re: Giraph job hangs indefinitely=0A=
                    and is eventually killed by JobTracker
=0A=
                   =20
=0A=
                    =0A=
                    You have given -w 30=2C make sure that that many=0A=
                    number of map tasks are
=0A=
                    configured in your cluster
=0A=
                   =20
=0A=
                    =0A=
                    On Thu=2C Apr 3=2C 2014 at 6:24 PM=2C Avery Ching <achi=
ng@apache.org>=0A=
                    wrote:
=0A=
                    > My guess is that you don't get your resources.=0A=
                     It would be very helpful to
=0A=
                    > print the master log.  You can find it when the=0A=
                    job is running to look at
=0A=
                    > the Hadoop counters on the job UI page.
=0A=
                    >
=0A=
                    > Avery
=0A=
                    >
=0A=
                    >
=0A=
                    > On 4/3/14=2C 12:49 PM=2C Vikesh Khanna wrote:
=0A=
                    >
=0A=
                    > Hi=2C
=0A=
                    >
=0A=
                    > I am running the PageRank benchmark under=0A=
                    giraph-examples from giraph-1.0.0
=0A=
                    > release. I am using the following command to=0A=
                    run the job (as mentioned here)
=0A=
                    >
=0A=
                    > vikesh@madmax
=0A=
                    >=0A=
/lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apa=
che/giraph/examples
=0A=
                    > $ $HADOOP_HOME/bin/hadoop jar
=0A=
                    >=0A=
$GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with=
-dependencies.jar
=0A=
                    > org.apache.giraph.benchmark.PageRankBenchmark=0A=
                    -e 1 -s 3 -v -V 50000000 -w 30
=0A=
                    >
=0A=
                    >
=0A=
                    > However=2C the job gets stuck at map 9% and is=0A=
                    eventually killed by the
=0A=
                    > JobTracker on reaching the mapred.task.timeout=0A=
                    (default 10 minutes). I tried
=0A=
                    > increasing the timeout to a very large value=2C=0A=
                    and the job went on for over 8
=0A=
                    > hours without completion. I also tried the=0A=
                    ShortestPathsBenchmark=2C which
=0A=
                    > also fails the same way.
=0A=
                    >
=0A=
                    >
=0A=
                    > Any help is appreciated.
=0A=
                    >
=0A=
                    >
=0A=
                    > ****** ---------------- ***********
=0A=
                    >
=0A=
                    >
=0A=
                    > Machine details:
=0A=
                    >
=0A=
                    > Linux version 2.6.32-279.14.1.el6.x86_64
=0A=
                    > (mockbuild@c6b8.bsys.dev.centos.org)=0A=
                    (gcc version 4.4.6 20120305 (Red Hat
=0A=
                    > 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC=0A=
                    2012
=0A=
                    >
=0A=
                    > Architecture: x86_64
=0A=
                    > CPU op-mode(s): 32-bit=2C 64-bit
=0A=
                    > Byte Order: Little Endian
=0A=
                    > CPU(s): 64
=0A=
                    > On-line CPU(s) list: 0-63
=0A=
                    > Thread(s) per core: 1
=0A=
                    > Core(s) per socket: 8
=0A=
                    > CPU socket(s): 8
=0A=
                    > NUMA node(s): 8
=0A=
                    > Vendor ID: GenuineIntel
=0A=
                    > CPU family: 6
=0A=
                    > Model: 47
=0A=
                    > Stepping: 2
=0A=
                    > CPU MHz: 1064.000
=0A=
                    > BogoMIPS: 5333.20
=0A=
                    > Virtualization: VT-x
=0A=
                    > L1d cache: 32K
=0A=
                    > L1i cache: 32K
=0A=
                    > L2 cache: 256K
=0A=
                    > L3 cache: 24576K
=0A=
                    > NUMA node0 CPU(s): 1-8
=0A=
                    > NUMA node1 CPU(s): 9-16
=0A=
                    > NUMA node2 CPU(s): 17-24
=0A=
                    > NUMA node3 CPU(s): 25-32
=0A=
                    > NUMA node4 CPU(s): 0=2C33-39
=0A=
                    > NUMA node5 CPU(s): 40-47
=0A=
                    > NUMA node6 CPU(s): 48-55
=0A=
                    > NUMA node7 CPU(s): 56-63
=0A=
                    >
=0A=
                    >
=0A=
                    > I am using a pseudo-distributed Hadoop cluster=0A=
                    on a single machine with
=0A=
                    > 64-cores.
=0A=
                    >
=0A=
                    >
=0A=
                    > *****-------------*******
=0A=
                    >
=0A=
                    >
=0A=
                    > Thanks=2C
=0A=
                    > Vikesh Khanna=2C
=0A=
                    > Masters=2C Computer Science (Class of 2015)
=0A=
                    > Stanford University
=0A=
                    >
=0A=
                    >
=0A=
                    >
=0A=
                  =0A=
                 =20
=0A=
                  =0A=
                =0A=
              =0A=
             =20
=0A=
            =0A=
           =20
=0A=
            =0A=
          =0A=
        =0A=
       =20
=0A=
        =0A=
      =0A=
    =0A=
   =20
=0A=
  =0A=
=0A=


 		 	   		  =

--_11dfd472-fb63-47e0-87cc-b2222b24cf68_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 12pt=3B
font-family:Calibri
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>Hi Vikesh=2C<div><br></div><div>=
It seems that you are trying to run benchmarks on giraph.</div><div>We had =
a lot of improvements in 1.1.0-SNAPSHOT - (though it is not released public=
ly in maven at Facebook we run all our applications on the snapshot version=
)</div><div>So=2C you can pull the latest trunk from giraph:&nbsp=3B</div><=
div>git clone https://git-wip-us.apache.org/repos/asf/giraph.git<br>And the=
n try running some applications.</div><div><br></div><div>[you are correct=
=2C we store hostnames-taskid mappings in the beginning of the run=2C so u =
can see such failures]<br><div><hr id=3D"stopSpelling">Date: Mon=2C 7 Apr 2=
014 16:27:09 -0700<br>From: vikesh@stanford.edu<br>To: user@giraph.apache.o=
rg<br>Subject: [Solved] Giraph job hangs indefinitely and is eventually kil=
led by JobTracker<br><br><div style=3D"font-family:times new roman=2C new y=
ork=2C times=2C serif=3Bfont-size:12pt=3Bcolor:#000000=3B"><div>Hi=2C&nbsp=
=3B<br></div><div><br></div><div>Thanks for the help! Turns out this was ha=
ppening because /etc/hosts had an outdated IP address (dynamic) for the hos=
t that was being used as the master. Giraph was probably failing to communi=
cate with the master throughout and getting stuck indefinitely.</div><div><=
span style=3D"font-size:12pt=3B"><br></span></div><div><span style=3D"font-=
size:12pt=3B">Thanks=2C</span></div><div><span></span>Vikesh Khanna=2C<br>M=
asters=2C Computer Science (Class of 2015)<br>Stanford University<br><span>=
</span><br></div><div><br></div><hr id=3D"ecxzwchr"><div style=3D"color:#00=
0=3Bfont-weight:normal=3Bfont-style:normal=3Btext-decoration:none=3Bfont-fa=
mily:Helvetica=2CArial=2Csans-serif=3Bfont-size:12pt=3B"><b>From: </b>"Vike=
sh Khanna" &lt=3Bvikesh@stanford.edu&gt=3B<br><b>To: </b>user@giraph.apache=
.org<br><b>Sent: </b>Monday=2C April 7=2C 2014 2:58:13 PM<br><b>Subject: </=
b>Re: Giraph job hangs indefinitely and is eventually killed by JobTracker<=
br><div><br></div><div style=3D"font-family:times new roman=2C new york=2C =
times=2C serif=3Bfont-size:12pt=3Bcolor:#000000=3B"><div>@Pankaj=2C I am ru=
nning the ShortestPath example on a tiny graph now (5 nodes). That is also =
getting hung indefinitely the exact same way. This machine has 1 TB of memo=
ry and I have used -Xmx25g (25 GB)&nbsp=3B<br></div><div>as Java options. S=
o hopefully it should not be because of memory limitation. &nbsp=3B[<span s=
tyle=3D"font-size:small=3B">(free/total/max) =3D 1706.68M / 1979.75M / 2524=
2.25M<span style=3D"font-size:medium=3B">]</span><br></span></div><div><br>=
</div><div>@Lukas=2C I am trying to run the example packaged with the Girap=
h installation - SimpleShortestPathsVertex. I haven't written any code myse=
lf yet - just trying to get this to work first. I am not getting any memory=
 exception - no dump file is being generated at the DumpPath.</div><div><br=
></div><div>$HADOOP_HOME/bin/hadoop jar ~/.local/bin/giraph-examples.jar or=
g.apache.giraph.GiraphRunner -D giraph.logLevel=3D"all" -libjars ~/.local/b=
in/giraph-core.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vi=
f org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -=
vip /user/vikesh/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWi=
thValueTextOutputFormat -op /user/vikesh/shortestPaths8 -ca SimpleShortestP=
athsVertex.source=3D2 -w 1</div><div><span><br></span></div><div>I am print=
ing debug level logs now=2C and I am seeing these calls indefinitely in bot=
h the zookeeper and worker tasks -&nbsp=3B</div><div><pre id=3D"ecxaeaoofnh=
gocdbnbeljkmbjdmhbcokfdb-mousedown"><span style=3D"font-size:small=3B">2014=
-04-07 14:45:32=2C325 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 8=
=0A=
2014-04-07 14:45:35=2C326 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #34=0A=
2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #34=
=0A=
2014-04-07 14:45:35=2C327 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 2=0A=
2014-04-07 14:45:38=2C328 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 sending #35=0A=
2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.Client: IPC Client (4=
7) connection to /127.0.0.1:45894 from job_201404071443_0001 got value #35=
=0A=
2014-04-07 14:45:38=2C329 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 1=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Got timed signaled of false=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Wait for 0=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Got timed signaled of false=0A=
2014-04-07 14:45:38=2C910 DEBUG org.apache.giraph.zk.PredicateLock: waitMse=
cs: Wait for 0</span></pre></div><div><span>These calls go on for 10 minute=
s and then the job is killed by Hadoop.</span></div><div><span><br></span><=
/div><div><span>Thanks=2C</span></div><div><span></span>Vikesh Khanna=2C<br=
>Masters=2C Computer Science (Class of 2015)<br>Stanford University<br><spa=
n></span><br></div><div><br></div><hr id=3D"ecxzwchr"><div style=3D"color:#=
000=3Bfont-weight:normal=3Bfont-style:normal=3Btext-decoration:none=3Bfont-=
family:Helvetica=2CArial=2Csans-serif=3Bfont-size:12pt=3B"><b>From: </b>"Lu=
kas Nalezenec" &lt=3Blukas.nalezenec@firma.seznam.cz&gt=3B<br><b>To: </b>us=
er@giraph.apache.org<br><b>Sent: </b>Monday=2C April 7=2C 2014 4:13:23 AM<b=
r><b>Subject: </b>Re: Giraph job hangs indefinitely and is eventually kille=
d by JobTracker<br><div><br></div>=0A=
  =0A=
    =0A=
  =0A=
  =0A=
    <div class=3D"ecxmoz-cite-prefix">Hi=2C<br>=0A=
      <code>Try making and analyzing memory dump after exception (JVM=0A=
        param -XX:+HeapDumpOnOutOfMemoryError</code>)<br>=0A=
      What configuration (mainly Partition class) do you use ?<br>=0A=
      Lukas<br>=0A=
      <br>=0A=
      On 7.4.2014 11:45=2C Vikesh Khanna wrote:<br>=0A=
    </div>=0A=
    <blockquote cite=3D"mid:1904550749.28380.1396863944166.JavaMail.zimbra@=
stanford.edu">=0A=
      =0A=
      <div style=3D"font-family:times new roman=2C new york=2C times=2C ser=
if=3Bfont-size:12pt=3Bcolor:#000000=3B">=0A=
        <div>Hi=2C</div>=0A=
        <div><br>=0A=
        </div>=0A=
        <div>Any ideas why Giraph waits indefinitely? I've been stuck on=0A=
          this for a long time now.&nbsp=3B</div>=0A=
        <div><br>=0A=
        </div>=0A=
        <div>Thanks=2C</div>=0A=
        <div><span></span>Vikesh Khanna=2C<br>=0A=
          Masters=2C Computer Science (Class of 2015)<br>=0A=
          Stanford University<br>=0A=
          <span></span><br>=0A=
        </div>=0A=
        <div><br>=0A=
        </div>=0A=
        <hr id=3D"ecxzwchr">=0A=
        <div style=3D"color:#000=3Bfont-weight:normal=3Bfont-style:normal=
=3Btext-decoration:none=3Bfont-family:Helvetica=2CArial=2Csans-serif=3Bfont=
-size:12pt=3B"><b>From:=0A=
          </b>"Vikesh Khanna" <a class=3D"ecxmoz-txt-link-rfc2396E" href=3D=
"mailto:vikesh@stanford.edu" target=3D"_blank">&lt=3Bvikesh@stanford.edu&gt=
=3B</a><br>=0A=
          <b>To: </b><a class=3D"ecxmoz-txt-link-abbreviated" href=3D"mailt=
o:user@giraph.apache.org" target=3D"_blank">user@giraph.apache.org</a><br>=
=0A=
          <b>Sent: </b>Friday=2C April 4=2C 2014 6:06:51 AM<br>=0A=
          <b>Subject: </b>Re: Giraph job hangs indefinitely and is=0A=
          eventually killed by JobTracker<br>=0A=
          <div><br>=0A=
          </div>=0A=
          <div style=3D"font-family:times new roman=2C new york=2C times=2C=
 serif=3Bfont-size:12pt=3Bcolor:#000000=3B">=0A=
            <div>Hi Avery=2C<br>=0A=
            </div>=0A=
            <div><br>=0A=
            </div>=0A=
            <div>I tried both the options. It does appear to be a GC=0A=
              problem. The problem continues with the second option as=0A=
              well :(. I have attached the logs after enabling the first=0A=
              set of options and using 1 worker. Would be very helpful=0A=
              if you can take a look.&nbsp=3B</div>=0A=
            <div><br>=0A=
            </div>=0A=
            <div>This machine has 1 TB memory. We ran benchmarks of=0A=
              various other graph libraries on this machine and they=0A=
              worked fine (even with graphs 10x larger than the Giraph=0A=
              PageRank benchmark - 40 million nodes). I am sure Giraph=0A=
              would work fine as well - this should not be a resource=0A=
              constraint. &nbsp=3B</div>=0A=
            <div><br>=0A=
            </div>=0A=
            <div>Thanks=2C</div>=0A=
            <div><span></span>Vikesh Khanna=2C<br>=0A=
              Masters=2C Computer Science (Class of 2015)<br>=0A=
              Stanford University<br>=0A=
              <span></span><br>=0A=
            </div>=0A=
            <div><br>=0A=
            </div>=0A=
            <hr id=3D"ecxzwchr">=0A=
            <div style=3D"color:#000=3Bfont-weight:normal=3Bfont-style:norm=
al=3Btext-decoration:none=3Bfont-family:Helvetica=2CArial=2Csans-serif=3Bfo=
nt-size:12pt=3B"><b>From:=0A=
              </b>"Avery Ching" <a class=3D"ecxmoz-txt-link-rfc2396E" href=
=3D"mailto:aching@apache.org" target=3D"_blank">&lt=3Baching@apache.org&gt=
=3B</a><br>=0A=
              <b>To: </b><a class=3D"ecxmoz-txt-link-abbreviated" href=3D"m=
ailto:user@giraph.apache.org" target=3D"_blank">user@giraph.apache.org</a><=
br>=0A=
              <b>Sent: </b>Thursday=2C April 3=2C 2014 7:26:56 PM<br>=0A=
              <b>Subject: </b>Re: Giraph job hangs indefinitely and is=0A=
              eventually killed by JobTracker<br>=0A=
              <div><br>=0A=
              </div>=0A=
              <div class=3D"ecxmoz-cite-prefix">This is for a single worker=
=0A=
                it appears.&nbsp=3B Most likely your worker went into GC an=
d=0A=
                never returned.&nbsp=3B You can try with GC settings turned=
 on=2C=0A=
                try adding something like.<br>=0A=
                <br>=0A=
                -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails=0A=
                -XX:+PrintGCTimeStamps -verbose:gc <br>=0A=
                <br>=0A=
                You could also try the concurrent mark/sweep collector.&nbs=
p=3B=0A=
                <br>=0A=
                <br>=0A=
                -XX:+UseConcMarkSweepGC<br>=0A=
                <br>=0A=
                Any chance you can use more workers and/or get more=0A=
                memory?<br>=0A=
                <br>=0A=
                Avery<br>=0A=
                <br>=0A=
                On 4/3/14=2C 5:46 PM=2C Vikesh Khanna wrote:<br>=0A=
              </div>=0A=
              <blockquote cite=3D"mid:1696746208.4389778.1396572398582.Java=
Mail.zimbra@stanford.edu">=0A=
                <div style=3D"font-family:times new roman=2C new york=2C ti=
mes=2C serif=3Bfont-size:12pt=3Bcolor:#000000=3B">=0A=
                  <div>@Avery=2C<br>=0A=
                  </div>=0A=
                  <div><br>=0A=
                  </div>=0A=
                  <div>Thanks for the help. I checked out the task logs=2C=
=0A=
                    and turns out there was an exception &nbsp=3B"GC overhe=
ad=0A=
                    limit exceeded" due to which the benchmarks wouldn't=0A=
                    even load the vertices. I got around it by=0A=
                    increasing the heap size (mapred.child.java.opts) in=0A=
                    mapred-site.xml. The benchmark is loading vertices=0A=
                    now. However=2C the job is still getting stuck=0A=
                    indefinitely (and eventually killed). I have=0A=
                    attached the small log for the map task on 1 worker.=0A=
                    Would really appreciate if you can help understand=0A=
                    the cause.&nbsp=3B</div>=0A=
                  <div><br>=0A=
                  </div>=0A=
                  <div>Thanks=2C</div>=0A=
                  <div><span></span>Vikesh Khanna=2C<br>=0A=
                    Masters=2C Computer Science (Class of 2015)<br>=0A=
                    Stanford University<br>=0A=
                    <span></span><br>=0A=
                  </div>=0A=
                  <div><br>=0A=
                  </div>=0A=
                  <hr id=3D"ecxzwchr">=0A=
                  <div style=3D"color:#000=3Bfont-weight:normal=3Bfont-styl=
e:normal=3Btext-decoration:none=3Bfont-family:Helvetica=2CArial=2Csans-seri=
f=3Bfont-size:12pt=3B"><b>From:=0A=
=0A=
                    </b>"Praveen kumar s.k" <a class=3D"ecxmoz-txt-link-rfc=
2396E" href=3D"mailto:skpraveenkumar9@gmail.com" target=3D"_blank">&lt=3Bsk=
praveenkumar9@gmail.com&gt=3B</a><br>=0A=
                    <b>To: </b><a class=3D"ecxmoz-txt-link-abbreviated" hre=
f=3D"mailto:user@giraph.apache.org" target=3D"_blank">user@giraph.apache.or=
g</a><br>=0A=
                    <b>Sent: </b>Thursday=2C April 3=2C 2014 4:40:07 PM<br>=
=0A=
                    <b>Subject: </b>Re: Giraph job hangs indefinitely=0A=
                    and is eventually killed by JobTracker<br>=0A=
                    <div><br>=0A=
                    </div>=0A=
                    You have given -w 30=2C make sure that that many=0A=
                    number of map tasks are<br>=0A=
                    configured in your cluster<br>=0A=
                    <div><br>=0A=
                    </div>=0A=
                    On Thu=2C Apr 3=2C 2014 at 6:24 PM=2C Avery Ching <a cl=
ass=3D"ecxmoz-txt-link-rfc2396E" href=3D"mailto:aching@apache.org" target=
=3D"_blank">&lt=3Baching@apache.org&gt=3B</a>=0A=
                    wrote:<br>=0A=
                    &gt=3B My guess is that you don't get your resources.=
=0A=
                    &nbsp=3BIt would be very helpful to<br>=0A=
                    &gt=3B print the master log. &nbsp=3BYou can find it wh=
en the=0A=
                    job is running to look at<br>=0A=
                    &gt=3B the Hadoop counters on the job UI page.<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Avery<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B On 4/3/14=2C 12:49 PM=2C Vikesh Khanna wrote:<br=
>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Hi=2C<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B I am running the PageRank benchmark under=0A=
                    giraph-examples from giraph-1.0.0<br>=0A=
                    &gt=3B release. I am using the following command to=0A=
                    run the job (as mentioned here)<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B vikesh@madmax<br>=0A=
                    &gt=3B=0A=
/lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apa=
che/giraph/examples<br>=0A=
                    &gt=3B $ $HADOOP_HOME/bin/hadoop jar<br>=0A=
                    &gt=3B=0A=
$GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with=
-dependencies.jar<br>=0A=
                    &gt=3B org.apache.giraph.benchmark.PageRankBenchmark=0A=
                    -e 1 -s 3 -v -V 50000000 -w 30<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B However=2C the job gets stuck at map 9% and is=
=0A=
                    eventually killed by the<br>=0A=
                    &gt=3B JobTracker on reaching the mapred.task.timeout=
=0A=
                    (default 10 minutes). I tried<br>=0A=
                    &gt=3B increasing the timeout to a very large value=2C=
=0A=
                    and the job went on for over 8<br>=0A=
                    &gt=3B hours without completion. I also tried the=0A=
                    ShortestPathsBenchmark=2C which<br>=0A=
                    &gt=3B also fails the same way.<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Any help is appreciated.<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B ****** ---------------- ***********<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Machine details:<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Linux version 2.6.32-279.14.1.el6.x86_64<br>=0A=
                    &gt=3B (<a class=3D"ecxmoz-txt-link-abbreviated" href=
=3D"mailto:mockbuild@c6b8.bsys.dev.centos.org" target=3D"_blank">mockbuild@=
c6b8.bsys.dev.centos.org</a>)=0A=
                    (gcc version 4.4.6 20120305 (Red Hat<br>=0A=
                    &gt=3B 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC=
=0A=
                    2012<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Architecture: x86_64<br>=0A=
                    &gt=3B CPU op-mode(s): 32-bit=2C 64-bit<br>=0A=
                    &gt=3B Byte Order: Little Endian<br>=0A=
                    &gt=3B CPU(s): 64<br>=0A=
                    &gt=3B On-line CPU(s) list: 0-63<br>=0A=
                    &gt=3B Thread(s) per core: 1<br>=0A=
                    &gt=3B Core(s) per socket: 8<br>=0A=
                    &gt=3B CPU socket(s): 8<br>=0A=
                    &gt=3B NUMA node(s): 8<br>=0A=
                    &gt=3B Vendor ID: GenuineIntel<br>=0A=
                    &gt=3B CPU family: 6<br>=0A=
                    &gt=3B Model: 47<br>=0A=
                    &gt=3B Stepping: 2<br>=0A=
                    &gt=3B CPU MHz: 1064.000<br>=0A=
                    &gt=3B BogoMIPS: 5333.20<br>=0A=
                    &gt=3B Virtualization: VT-x<br>=0A=
                    &gt=3B L1d cache: 32K<br>=0A=
                    &gt=3B L1i cache: 32K<br>=0A=
                    &gt=3B L2 cache: 256K<br>=0A=
                    &gt=3B L3 cache: 24576K<br>=0A=
                    &gt=3B NUMA node0 CPU(s): 1-8<br>=0A=
                    &gt=3B NUMA node1 CPU(s): 9-16<br>=0A=
                    &gt=3B NUMA node2 CPU(s): 17-24<br>=0A=
                    &gt=3B NUMA node3 CPU(s): 25-32<br>=0A=
                    &gt=3B NUMA node4 CPU(s): 0=2C33-39<br>=0A=
                    &gt=3B NUMA node5 CPU(s): 40-47<br>=0A=
                    &gt=3B NUMA node6 CPU(s): 48-55<br>=0A=
                    &gt=3B NUMA node7 CPU(s): 56-63<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B I am using a pseudo-distributed Hadoop cluster=
=0A=
                    on a single machine with<br>=0A=
                    &gt=3B 64-cores.<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B *****-------------*******<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B Thanks=2C<br>=0A=
                    &gt=3B Vikesh Khanna=2C<br>=0A=
                    &gt=3B Masters=2C Computer Science (Class of 2015)<br>=
=0A=
                    &gt=3B Stanford University<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                    &gt=3B<br>=0A=
                  </div>=0A=
                  <div><br>=0A=
                  </div>=0A=
                </div>=0A=
              </blockquote>=0A=
              <br>=0A=
            </div>=0A=
            <div><br>=0A=
            </div>=0A=
          </div>=0A=
        </div>=0A=
        <div><br>=0A=
        </div>=0A=
      </div>=0A=
    </blockquote>=0A=
    <br>=0A=
  =0A=
=0A=
</div><div><br></div></div></div><div><br></div></div></div></div> 		 	   	=
	  </div></body>
</html>=

--_11dfd472-fb63-47e0-87cc-b2222b24cf68_--