Yes, before conversion they were also sorte= d by vertex ID.
That was the format before conversion with form= at [source destination weight] of edges.
I'm using a medium-ti= er EC2 instance c3.2xlarge.=C2=A0 It has 8 vCPUs and 15 Gi of memory accord= ing to Amazon's website.

Any suggestions on tweaking Xmx?=C2=A0 I tried doing that without= any luck, it just didn't run the map phase then failed.

On Thu, Jul 3, 2014 at 1:57 PM, Young Han wrote:
From the other thread...

Yeah, your input format looks correct. Did you have the graph sor= ted by=20 source vertex IDs before conversion? (I'm not sure if duplicate entries= =20 with the same source ID matters, but just in case.)

They're all out of memory errors, so I think Xmx is the culprit. What= =20 type of EC2 instances are you using? You probably want to use something=20 larger than t1.micro or m1.small.

Young

On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe wrote:
First = of all, I started this email thread with my old email from Yahoo, which was= a mistake because it kept sending out duplicates.=C2=A0 Sorry for the inco= nvenience, but I'll continue it using this email thread from now on.
I originally posted this:
Hello,

Giraph: release-1.0.0-RC3

In short, when I use large graphs with the Sh= ortest Paths example, it fails.=C2=A0 But when I use the small graph provided on the Quick Start gu= ide, it succeeds.
I converted all of my large graphs into the format shown in the Quick= Start guide to simply things.
I'm using a one-node setup.

Here is the command I'm using:
org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShort= estPathsVertex
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/ubuntu/input/CA.txt -of org.apache.giraph.io.formats.IdWi= thValueTextOutputFormat
-op /user/ubuntu/output/shortestpaths
-w 1

(all on one line)

CA.txt is a large graph file: 96,026,228 bytes

The job fails in 10mins, 46sec.

Two Map tasks are created when run.
The first one, task_201407021636_0006_m_000000, is KILLED.=C2=A0 sysl= og:
```

The second one, task_201407021636_0006_m_000001, goes to the FAILED s= tate.=C2=A0 syslog:
I've tried increasing Java Heap= Space in hadoop/conf/mapred-site.xml by adding this:
=C2=A0 <property>
=C2=A0=C2=A0=C2=A0 <name>= mapred.child.java.opts</name>
=C2=A0=C2=A0=C2=A0 <value>-Xmx1024m</value>
=C2=A0 </property>

But that just caused the entire job to fail from the start.

Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and those=20 versions provide me with more and different errors to debug that relate=20 to problems with Hadoop itself.=C2=A0 So the Giraph version I'm current= ly=20 using seems to be the best for me because these errors seem more=20 manageable.

What can I do to fix this error?=C2=A0 I thought Giraph was built for large sca= le graph processing so I suppose this problem was encountered before by=20 someone testing large graphs.=C2=A0 I searched through the mailing archives and couldn't find anything though.=

I can provide more information if you need it.=C2=A0 Thanks a lot.

Bryan Rowe

10 minutes seems way= too long to load in 91mb from HDFS. Are you sure your graph's format is correct= ? For the Json input formats, each line of your file should be:

[vertex id, vertex value, [[dst= id, edge weight], [dst id, edge weigth], ..., [dst id, edge weight]]]

You can set the vertex value for every vertex to 0 (SSSP will overwrite=20 that value in the first superstep). If your graph doesn't have edge=20 weights, you can just set them to 1.

Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.

Young

Hi Young,

I believe my graph= has the correct format.
Here is the first 40 lines of th= e graph I'm using:
[0,0,[[1,1],[4,1],[5,1]]]
[1,0,[[0,1],[2,1],[3,1]]]
[2,0,[[1,1],[6,1],[7,1]]]
[3,0,[[8,1],[1,1],= [9,1]]]
[4,0,[[0,1],[10,1],[11,1]]]
[5,0,[[0,1]]]
[6,0,[[2,1],[12,1]]]
[7,0= ,[[2,1],[12,1],[13,1]]]
[8,0,[[11,1],[3,1],[60,1]]]
[9,0,[[3,1]]]
[10,0,[[35,1],[4,1],[38,1]]]
[11,0,[[8,1],[59,1],[4,1]]]<= br clear=3D"none">[12,0,[[41,1],[6,1],[7,1]]]
[13,0,[[89,= 1],[90,1],[91,1],[7,1]]]
[14,0,[[18,1],[19,1],[15,1]]]
[15,0,[[16,1],[17,1],[14,1]= ]]
[16,0,[[20,1],[21,1],[22,1],[15,1]]]
[17,0,[[24,1],[23,1],[21,1],[15,1]]]
[18,0,[[24,1],[14,1]]]
[19,0,[[25,1],[22,1],[14,1]]]
[20,0,[[16,1],[25,1],[26,1]]]
[21,0,[[16,1]= ,[17,1],[30,1]]]
[22,0,[[16,1],[19,1]]]
[23,0,[[17,1],[105,1]]]
[24,0,[[17,1],[18,1],[58,1]]]
[25,0,[[27,1],[19,1],[20,1]]]
[26,0,[[28,1= ],[27,1],[20,1]]]
[27,0,[[25,1],[26,1],[29,1]]]
[28,0,[[26,1],[29,1],[30,1]]]
[29,0,[[27,1],[28,1],[31,1]= ]]
[30,0,[[32,1],[33,1],[28,1],[21,1]]]
[31,0,[[34,1],[29,1]]]
[32,0,[[105,1],[30,1],[39,1]]] [33,0,[[30,1]]]
[34,0,[[38,1],[31,1]]]
= [35,0,[[10,1],[36,1],[37,1]]]
[36,0,[[40,1],[35,1],[39,1]= ]]
[37,0,[[41,1],[35,1]]]
[38,0,[[10,1],[34,1]]]
[39,0,[[32,1],[58,1],[36,1],[119,1= ]]]
[40,0,[[90,1],[36,1]]]

Also, sorry about sending the email twice.=C2=A0 My email client mes= sed up.

Thanks,
Bryan

I've tried a larger graph.=C2=A0 It ran for 2 hours then failed = with different error messages I believe.

Bryan

--001a11c3f90a8946fc04fd50676c--