incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Vertex exists error when processing input splits for Sequence file
Date Mon, 30 Jan 2012 08:37:09 GMT
In your implementation of VertexReader#getCurrentVertex(), are you 
providing a new BasicVertex object each time (after nextVertex() is 
called)?  If you are reusing the same BasicVertex object you could get 
the problems like the ones you describe.

Avery

On 1/30/12 12:24 AM, David Garcia wrote:
> Thx for the response Avery. . .unfortunately, I can confirm that I do 
> not have duplicates in my data.  I have narrowed the problem to the 
> following method:
>
> private VertexEdgeCount readVerticesFromInputSplit(
>             InputSplit inputSplit) throws IOException, 
> InterruptedException {
> .
> .
> .
> while (vertexReader.nextVertex()) {
>             BasicVertex<I, V, E, M> readerVertex =
>                 vertexReader.getCurrentVertex();
> .
> .
> .
> When the .nextVertex() method is called, it automatically mutates 
> every HashMap in a Partition in the InputSplitCache.  The nature of 
> the mutation is to convert every Vertex (in the respective partition) 
> to next vertex resulting from .nextVertex().  (Again, note that the 
> underlying RecordReader is a SequenceFileRecordReader).  For example, 
> if I have the following inputSplitCache:
>
> inputSplitCache
> [0]
> Key -> BasicPartitionOwner. . .
> Value - > Partition
> Conf -> Configuration . . .
> partitionID = 0
> vertexMap
> [0] -> 00kK4. . .
>
> I have one vertex in my partition. . .assuming that the next vertex ID 
> is mM424, after vertexReader.nextVertex() is called, the data 
> structure changes to this. . .
>
> inputSplitCache
> [0]
> Key -> BasicPartitionOwner. . .
> Value - > Partition
> Conf -> Configuration . . .
> partitionID = 0
> vertexMap
> [0] -> mM424. . .
>
> After partition.putVertex(. . .) is called, another identical vertex 
> is added.
>
> inputSplitCache
> [0]
> Key -> BasicPartitionOwner. . .
> Value - > Partition
> Conf -> Configuration . . .
> partitionID = 0
> vertexMap
> [0] -> mM424. . .
> [1] -> mM424. . .
>
> This leads to the error in my previous email. . .All the vertices in 
> my graph end up with the data of my final Vertex, as this pattern 
> suggests.  It's almost as if some weird aspectJ is intercepting the 
> call to .nextVertex().  I'm happy to brandish my code.  I feel it's 
> fairly simple.  It's just a sequenceFile input format and some trivial 
> vertex class.
>
> -Dave
>
> From: Avery Ching <aching@apache.org <mailto:aching@apache.org>>
> Reply-To: "giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>" 
> <giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>>
> Date: Mon, 30 Jan 2012 01:28:12 -0600
> To: "giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>" 
> <giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>>
> Subject: Re: Vertex exists error when processing input splits for 
> Sequence file
>
> Hi David,
>
> So from the errors, it appears that your input has multiple vertices 
> with the same vertex id.  Currently we throw an exception to prevent 
> this from happening as it is typically not what you want.  You 
> probably want to watch the vertices being processed from the vertex 
> input format and see why you are getting duplicates.  It's likely to 
> be either an error with the data actually have vertices with the same 
> vertex id or an error with your custom vertex input format.
>
> To help debug, you might want to add some logging to your record 
> reader and print the vertex ids or you can add some logging to where 
> that code is called in BspServiceWorker#readVerticesFromInputSplit().
>
> Hope that helps,
>
> Avery
>
> On 1/29/12 8:13 PM, David Garcia wrote:
>>
>>
>> Hello, I get this error when I try run my job:
>> 2012-01-29 21:50:18,494 INFO or
>>
>> g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath = null, 1
of 1 InputSplits are finished.
>> 2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setup: Finally
loaded a total of (v=0, e=0)
>> 2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: inputSplitsAllDoneChanged
(all vertices sent from input splits)
>> 2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: Caught
exception just before end of setup
>> java.lang.IllegalStateException: moveVerticesToWorker: Vertex Vertex(id=zzYNBgKt2LF6ClLA2eMBzuN7SkA.,value=org.apache.hadoop.io.MapWritable@5ce8787a,#edges=0)
already exists!
>> 	at org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServiceWorker.java:1389)
>> 	at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:624)
>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>> .
>> .
>> .
>> I'm not sure where the start debugging. . .BspServiceWorker is hella big.  All input
is welcome.  As I mentioned, I'm processing a sequenceFile that has Text keys and MapWritable
Values.  I would like the vertices to have Text indices and MapWritable values.  (I'm not
inserting any edges for the time being. . .I just want to see the file get split properly).
 I have implemented custom input formats and record readers.  Thx
>> -Dave
>


Mime
View raw message