incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Garcia <dgar...@potomacfusion.com>
Subject Re: Vertex exists error when processing input splits for Sequence file
Date Mon, 30 Jan 2012 15:50:57 GMT
Thx again Avery for your prompt responses.  The problem you suggested didn't turn out to be
the actual problem.  But you lead me in the right direction.  It turns out that all my Vertex
instances were unique (I.e. New vertices were being created with getCurrentVertex() ). . .however,
SequenceFileRecordReader preserves singletons for its getCurrentKey() and getCurrentValue()
methods.  So every time you call nextKey/nextValue on the record reader, these singletons
get updated.  This was a real pain to figure out.  Thx again for all your help!!

-David

From: David Garcia <dgarcia@potomacfusion.com<mailto:dgarcia@potomacfusion.com>>
Reply-To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>"
<giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Date: Mon, 30 Jan 2012 08:48:21 -0600
To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Subject: Re: Vertex exists error when processing input splits for Sequence file

Ok, that's a good point.  My getCurrentVertext() method looks like this:

@Override
        public BasicVertex<I, V, E, M> getCurrentVertex() throws IOException, InterruptedException
{
             BasicVertex<I,V,E,M> vertex = BspUtils.createVertex(getContext().getConfiguration());

            I vertexID = (I)getRecordReader().getCurrentKey();
            V vertexValue = (V)getRecordReader().getCurrentValue();
            try{
                vertex.initialize(vertexID,vertexValue,null,null);
            }
            catch(Exception e){
                e.printStackTrace();
            }
            return vertex;
        }


Perhaps BspUtils is reusing it?

From: Avery Ching <aching@apache.org<mailto:aching@apache.org>>
Reply-To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>"
<giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Date: Mon, 30 Jan 2012 02:37:09 -0600
To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Subject: Re: Vertex exists error when processing input splits for Sequence file

In your implementation of VertexReader#getCurrentVertex(), are you providing a new BasicVertex
object each time (after nextVertex() is called)?  If you are reusing the same BasicVertex
object you could get the problems like the ones you describe.

Avery

On 1/30/12 12:24 AM, David Garcia wrote:
Thx for the response Avery. . .unfortunately, I can confirm that I do not have duplicates
in my data.  I have narrowed the problem to the following method:

private VertexEdgeCount readVerticesFromInputSplit(
            InputSplit inputSplit) throws IOException, InterruptedException {
.
.
.
while (vertexReader.nextVertex()) {
            BasicVertex<I, V, E, M> readerVertex =
                vertexReader.getCurrentVertex();
.
.
.
When the .nextVertex() method is called, it automatically mutates every HashMap in a Partition
in the InputSplitCache.  The nature of the mutation is to convert every Vertex (in the respective
partition) to next vertex resulting from .nextVertex().  (Again, note that the underlying
RecordReader is a SequenceFileRecordReader).  For example, if I have the following inputSplitCache:

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> 00kK4. . .

I have one vertex in my partition. . .assuming that the next vertex ID is mM424, after vertexReader.nextVertex()
is called, the data structure changes to this. . .

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .

After partition.putVertex(. . .) is called, another identical vertex is added.

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .
[1] -> mM424. . .

This leads to the error in my previous email. . .All the vertices in my graph end up with
the data of my final Vertex, as this pattern suggests.  It's almost as if some weird aspectJ
is intercepting the call to .nextVertex().  I'm happy to brandish my code.  I feel it's fairly
simple.  It's just a sequenceFile input format and some trivial vertex class.

-Dave

From: Avery Ching <aching@apache.org<mailto:aching@apache.org>>
Reply-To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>"
<giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Date: Mon, 30 Jan 2012 01:28:12 -0600
To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Subject: Re: Vertex exists error when processing input splits for Sequence file

Hi David,

So from the errors, it appears that your input has multiple vertices with the same vertex
id.  Currently we throw an exception to prevent this from happening as it is typically not
what you want.  You probably want to watch the vertices being processed from the vertex input
format and see why you are getting duplicates.  It's likely to be either an error with the
data actually have vertices with the same vertex id or an error with your custom vertex input
format.

To help debug, you might want to add some logging to your record reader and print the vertex
ids or you can add some logging to where that code is called in BspServiceWorker#readVerticesFromInputSplit().

Hope that helps,

Avery

On 1/29/12 8:13 PM, David Garcia wrote:


Hello, I get this error when I try run my job:

2012-01-29 21:50:18,494

 INFO or

g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath = null, 1 of 1 InputSplits
are finished.
2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setup: Finally loaded
a total of (v=0, e=0)
2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: inputSplitsAllDoneChanged
(all vertices sent from input splits)
2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: Caught exception
just before end of setup

java.lang.IllegalStateException: moveVerticesToWorker: Vertex Vertex(id=zzYNBgKt2LF6ClLA2eMBzuN7SkA.,value=org.apache.hadoop.io.MapWritable@5ce8787a,#edges=0)
already exists!
        at org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServiceWorker.java:1389)
        at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:624)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)

.

.

.

I'm not sure where the start debugging. . .BspServiceWorker is hella big.  All input is welcome.
 As I mentioned, I'm processing a sequenceFile that has Text keys and MapWritable Values.
 I would like the vertices to have Text indices and MapWritable values.  (I'm not inserting
any edges for the time being. . .I just want to see the file get split properly).  I have
implemented custom input formats and record readers.  Thx

-Dave



Mime
View raw message