Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D75C09E1F for ; Mon, 30 Jan 2012 21:00:43 +0000 (UTC) Received: (qmail 95542 invoked by uid 500); 30 Jan 2012 21:00:43 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 95465 invoked by uid 500); 30 Jan 2012 21:00:43 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 95457 invoked by uid 99); 30 Jan 2012 21:00:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jan 2012 21:00:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [24.173.238.83] (HELO mail.potomacfusion.com) (24.173.238.83) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jan 2012 21:00:37 +0000 Received: from PFI-MAIL.PFI.LOCAL ([192.168.100.7]) by pfi-mail ([192.168.100.7]) with mapi; Mon, 30 Jan 2012 15:00:16 -0600 From: David Garcia To: "giraph-user@incubator.apache.org" Date: Mon, 30 Jan 2012 15:00:11 -0600 Subject: Re: Vertex exists error when processing input splits for Sequence file Thread-Topic: Vertex exists error when processing input splits for Sequence file Thread-Index: AczfkipCJ9LXG5o5R7eJr2dU8qs0FA== Message-ID: In-Reply-To: <4F26D2DD.60907@apache.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.14.0.111121 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CB4C2FE65F33dgarciapotomacfusioncom_" MIME-Version: 1.0 --_000_CB4C2FE65F33dgarciapotomacfusioncom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sure. . .I'm basically trying to compute a diff between two commit nodes of= a Git-style history store. Given two graphs, find the subgraph to differs= between them. From: Avery Ching > Reply-To: "giraph-user@incubator.apache.org" > Date: Mon, 30 Jan 2012 11:26:53 -0600 To: "giraph-user@incubator.apache.org" > Subject: Re: Vertex exists error when processing input splits for Sequence = file Glad to hear you figured it out! Keep us informed on how your experiments = are going and what we can do to help. Avery On 1/30/12 7:50 AM, David Garcia wrote: Thx again Avery for your prompt responses. The problem you suggested didn'= t turn out to be the actual problem. But you lead me in the right directio= n. It turns out that all my Vertex instances were unique (I.e. New vertice= s were being created with getCurrentVertex() ). . .however, SequenceFileRec= ordReader preserves singletons for its getCurrentKey() and getCurrentValue(= ) methods. So every time you call nextKey/nextValue on the record reader, = these singletons get updated. This was a real pain to figure out. Thx aga= in for all your help!! -David From: David Garcia > Reply-To: "giraph-user@incubator.apache.org" > Date: Mon, 30 Jan 2012 08:48:21 -0600 To: "giraph-user@incubator.apache.org" > Subject: Re: Vertex exists error when processing input splits for Sequence = file Ok, that's a good point. My getCurrentVertext() method looks like this: @Override public BasicVertex getCurrentVertex() throws IOExceptio= n, InterruptedException { BasicVertex vertex =3D BspUtils.createVertex(getConte= xt().getConfiguration()); I vertexID =3D (I)getRecordReader().getCurrentKey(); V vertexValue =3D (V)getRecordReader().getCurrentValue(); try{ vertex.initialize(vertexID,vertexValue,null,null); } catch(Exception e){ e.printStackTrace(); } return vertex; } Perhaps BspUtils is reusing it? From: Avery Ching > Reply-To: "giraph-user@incubator.apache.org" > Date: Mon, 30 Jan 2012 02:37:09 -0600 To: "giraph-user@incubator.apache.org" > Subject: Re: Vertex exists error when processing input splits for Sequence = file In your implementation of VertexReader#getCurrentVertex(), are you providin= g a new BasicVertex object each time (after nextVertex() is called)? If yo= u are reusing the same BasicVertex object you could get the problems like t= he ones you describe. Avery On 1/30/12 12:24 AM, David Garcia wrote: Thx for the response Avery. . .unfortunately, I can confirm that I do not h= ave duplicates in my data. I have narrowed the problem to the following me= thod: private VertexEdgeCount readVerticesFromInputSplit( InputSplit inputSplit) throws IOException, InterruptedException= { . . . while (vertexReader.nextVertex()) { BasicVertex readerVertex =3D vertexReader.getCurrentVertex(); . . . When the .nextVertex() method is called, it automatically mutates every Has= hMap in a Partition in the InputSplitCache. The nature of the mutation is = to convert every Vertex (in the respective partition) to next vertex result= ing from .nextVertex(). (Again, note that the underlying RecordReader is a= SequenceFileRecordReader). For example, if I have the following inputSpli= tCache: inputSplitCache [0] Key -> BasicPartitionOwner. . . Value - > Partition Conf -> Configuration . . . partitionID =3D 0 vertexMap [0] -> 00kK4. . . I have one vertex in my partition. . .assuming that the next vertex ID is m= M424, after vertexReader.nextVertex() is called, the data structure changes= to this. . . inputSplitCache [0] Key -> BasicPartitionOwner. . . Value - > Partition Conf -> Configuration . . . partitionID =3D 0 vertexMap [0] -> mM424. . . After partition.putVertex(. . .) is called, another identical vertex is add= ed. inputSplitCache [0] Key -> BasicPartitionOwner. . . Value - > Partition Conf -> Configuration . . . partitionID =3D 0 vertexMap [0] -> mM424. . . [1] -> mM424. . . This leads to the error in my previous email. . .All the vertices in my gra= ph end up with the data of my final Vertex, as this pattern suggests. It's= almost as if some weird aspectJ is intercepting the call to .nextVertex().= I'm happy to brandish my code. I feel it's fairly simple. It's just a s= equenceFile input format and some trivial vertex class. -Dave From: Avery Ching > Reply-To: "giraph-user@incubator.apache.org" > Date: Mon, 30 Jan 2012 01:28:12 -0600 To: "giraph-user@incubator.apache.org" > Subject: Re: Vertex exists error when processing input splits for Sequence = file Hi David, So from the errors, it appears that your input has multiple vertices with t= he same vertex id. Currently we throw an exception to prevent this from ha= ppening as it is typically not what you want. You probably want to watch t= he vertices being processed from the vertex input format and see why you ar= e getting duplicates. It's likely to be either an error with the data actu= ally have vertices with the same vertex id or an error with your custom ver= tex input format. To help debug, you might want to add some logging to your record reader and= print the vertex ids or you can add some logging to where that code is cal= led in BspServiceWorker#readVerticesFromInputSplit(). Hope that helps, Avery On 1/29/12 8:13 PM, David Garcia wrote: Hello, I get this error when I try run my job: 2012-01-2 9 21:50:18,494 INFO or g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath =3D= null, 1 of 1 InputSplits are finished. 2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setu= p: Finally loaded a total of (v=3D0, e=3D0) 2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: i= nputSplitsAllDoneChanged (all vertices sent from input splits) 2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: C= aught exception just before end of setup java.lang.IllegalStateException: moveVerticesToWorker: Vertex Vertex(id=3Dz= zYNBgKt2LF6ClLA2eMBzuN7SkA.,value=3Dorg.apache.hadoop.io.MapWritable@5ce878= 7a,#edges=3D0) already exists! at org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(= BspServiceWorker.java:1389) at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.= java:624) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) . . . I'm not sure where the start debugging. . .BspServiceWorker is hella big. = All input is welcome. As I mentioned, I'm processing a sequenceFile that h= as Text keys and MapWritable Values. I would like the vertices to have Tex= t indices and MapWritable values. (I'm not inserting any edges for the tim= e being. . .I just want to see the file get split properly). I have implem= ented custom input formats and record readers. Thx -Dave --_000_CB4C2FE65F33dgarciapotomacfusioncom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Sure. . .I'm basically t= rying to compute a diff between two commit nodes of a Git-style history sto= re.  Given two graphs, find the subgraph to differs between them.

From: Avery C= hing <aching@apache.org>
= Reply-To: "giraph-user@incubator.apache.org" <giraph-user@incubator.apach= e.org>
Date: Mon, 30 Jan= 2012 11:26:53 -0600
To: "giraph-user@incubator.apache= .org" <giraph-us= er@incubator.apache.org>
Subject= : Re: Vertex exists error when processing input splits for Sequence= file

=20 =20
Glad to hear you figured it out!  Keep us informed on how your experiments are going and what we can do to help.

Avery

On 1/30/12 7:50 AM, David Garcia wrote:
Thx again Avery for your prompt responses.  The problem you= suggested didn't turn out to be the actual problem.  But you lead me in the right direction.  It turns out that all my Vert= ex instances were unique (I.e. New vertices were being created with getCurrentVertex() ). . .however, SequenceFileRecordReader preserves singletons for its getCurrentKey() and getCurrentValue() methods.  So every time you call nextKey/nextValue on the record reader, these singletons get updated.  This was a real pain to figure out.  Thx again = for all your help!!

-David

From: David Garcia <dgarcia@potomacfusion.com> Reply-To: "giraph-user= @incubator.apache.org" <giraph-user@incubator.apache.org>
Date: Mon, 30 Jan 2012 08:48:21 -0600
To: "giraph-user@incu= bator.apache.org" <giraph-user@incubator.apache.org>
Subject: Re: Vertex exists error when processing input splits for Sequence file

Ok, that's a good point.  My getCurrentVertext() meth= od looks like this:

@Override
        public BasicVertex<I, V, = E, M> getCurrentVertex() throws IOException, InterruptedException {
             BasicVer= tex<I,V,E,M> vertex =3D BspUtils.createVertex(getContext().getConfiguration());

            I vertexID =3D (I)getRecordReader().getCurrentKey();
            V vertexValue = =3D (V)getRecordReader().getCurrentValue();
            try{
                vertex.initialize(vertexID,vertexValue,null,null);   &= nbsp;            
            }
            catch(Exceptio= n e){
                = e.printStackTrace();
            }
            return vertex;=
        }


Perhaps BspUtils is reusing it?

From: Avery Ching <aching@apache.org>
Reply-To: "girap= h-user@incubator.apache.org" <giraph-user@incubator.apache.org>
Date: Mon, 30 Jan 2012 02:37:09 -0600
To: "giraph-user= @incubator.apache.org" <giraph-user@incubator.apache.org>
Subject: Re: Vertex exists error when processing input splits for Sequence file

In your implementation of VertexReader#getCurrentVertex(), are you providing a new BasicVertex object each time (after nextVertex() is called)?  If you are reusing the same BasicVertex object you could get the problems like the ones you describe.

Avery

On 1/30/12 12:24 AM, David Garcia wrote:
Thx for the response Avery. . .unfortunately, I can confirm that I do not have duplicates in my data.  I have narrowed the problem to the following method:

private VertexEdgeCount readVerticesFromInputSplit(
            InputS= plit inputSplit) throws IOException, InterruptedException {
.
.
.
while (vertexReader.nextVertex()) {
            BasicV= ertex<I, V, E, M> readerVertex =3D
             =   vertexReader.getCurrentVertex();
.
.
.
When the .nextVertex() method is called, it automatically mutates every HashMap in a Partition in the InputSplitCache.  The nature of the mutation is to convert every Vertex (in the respective partition) to next vertex resulting from .nextVertex().  (Again, note that the underlying RecordReader is a SequenceFileRecordReader).  For example, if I ha= ve the following inputSplitCache:

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID =3D 0
vertexMap
[0] -> 00kK4. . .

I have one vertex in my partition. . .assuming that the next vertex ID is mM424, after vertexReader.nextVertex() is called, the data structure changes to this. . .

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID =3D 0
vertexMap
[0] -> mM424. . .

After partition.putVertex(. . .) is called, another identical vertex is added.

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID =3D 0
vertexMap
[0] -> mM424. . .
[1] -> mM424. . .

This leads to the error in my previous email. . .All the vertices in my graph end up with the data of my final Vertex, as this pattern suggests.  It's almost as if some weird aspectJ is intercepting the call to .nextVertex().  I'm hap= py to brandish my code.  I feel it's fairly simple.=  It's just a sequenceFile input format and some trivial vertex class.

-Dave

From: Avery Ching <aching@apache.org>
Reply-To: "giraph-user@incubator.apache.org" <giraph-user@incubator.apache.org>
Date: Mon, 30 Jan 2012 01:28:12 -0600
To: "gir= aph-user@incubator.apache.org" <giraph-user@incubator.apache.org>
Subject: Re: Vertex exists error when processing input splits for Sequence file

Hi David= ,

So from the errors, it appears that your input has multiple vertices with the same vertex id.  Currently we throw an exception to prevent this from happening as it is typically not what you want.  You probably want to wat= ch the vertices being processed from the vertex input format and see why you are getting duplicates.  It's likely to be either an err= or with the data actually have vertices with the same vertex id or an error with your custom vertex input format.

To help debug, you might want to add some logging to your record reader and print the vertex ids or you can add some logging to where that code is called in BspServiceWorker#readVerticesFromInputSplit().
Hope that helps,

Avery

On 1/29/12 8:13 PM, David Garcia wrote:


Hello, I get this error when I try run my job:
2012-01-2
=20
9 21:50:18,494
=20
 INFO or
=20
g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath =3D=
 null, 1 of 1 InputSplits are finished.
2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setu=
p: Finally loaded a total of (v=3D0, e=3D0)
2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: i=
nputSplitsAllDoneChanged (all vertices sent from input splits)
2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: C=
aught exception just before end of setup
java.l=
ang.IllegalStateException: moveVerticesToWorker: Vertex Vertex(id=3DzzYNBgK=
t2LF6ClLA2eMBzuN7SkA.,value=3Dorg.apache.hadoop.io.MapWritable@5ce8787a,#ed=
ges=3D0) already exists!
	at org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServ=
iceWorker.java:1389)
	at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:62=
4)
	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
.
.
.
I'm not s=
ure where the start debugging. . .BspServiceWorker is hella big.  All input=
 is welcome.  As I mentioned, I'm processing a sequenceFile that has Text k=
eys and MapWritable Values.  I would like the vertices to have Text indices=
 and MapWritable values.  (I'm not inserting any edges for the time being. =
. .I just want to see the file get split properly).  I have implemented cus=
tom input formats and record readers.  Thx
-Dave



--_000_CB4C2FE65F33dgarciapotomacfusioncom_--