incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Vertex exists error when processing input splits for Sequence file
Date Mon, 30 Jan 2012 17:26:53 GMT
Glad to hear you figured it out!  Keep us informed on how your 
experiments are going and what we can do to help.

Avery

On 1/30/12 7:50 AM, David Garcia wrote:
> Thx again Avery for your prompt responses.  The problem you suggested 
> didn't turn out to be the actual problem.  But you lead me in the 
> right direction.  It turns out that all my Vertex instances were 
> unique (I.e. New vertices were being created with getCurrentVertex() 
> ). . .however, SequenceFileRecordReader preserves singletons for its 
> getCurrentKey() and getCurrentValue() methods.  So every time you call 
> nextKey/nextValue on the record reader, these singletons get updated. 
>  This was a real pain to figure out.  Thx again for all your help!!
>
> -David
>
> From: David Garcia <dgarcia@potomacfusion.com 
> <mailto:dgarcia@potomacfusion.com>>
> Reply-To: "giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>" 
> <giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>>
> Date: Mon, 30 Jan 2012 08:48:21 -0600
> To: "giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>" 
> <giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>>
> Subject: Re: Vertex exists error when processing input splits for 
> Sequence file
>
> Ok, that's a good point.  My getCurrentVertext() method looks like this:
>
> @Override
>         public BasicVertex<I, V, E, M> getCurrentVertex() throws 
> IOException, InterruptedException {
>              BasicVertex<I,V,E,M> vertex = 
> BspUtils.createVertex(getContext().getConfiguration());
>
>             I vertexID = (I)getRecordReader().getCurrentKey();
>             V vertexValue = (V)getRecordReader().getCurrentValue();
>             try{
>                 vertex.initialize(vertexID,vertexValue,null,null);
>             }
>             catch(Exception e){
>                 e.printStackTrace();
>             }
>             return vertex;
>         }
>
>
> Perhaps BspUtils is reusing it?
>
> From: Avery Ching <aching@apache.org <mailto:aching@apache.org>>
> Reply-To: "giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>" 
> <giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>>
> Date: Mon, 30 Jan 2012 02:37:09 -0600
> To: "giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>" 
> <giraph-user@incubator.apache.org 
> <mailto:giraph-user@incubator.apache.org>>
> Subject: Re: Vertex exists error when processing input splits for 
> Sequence file
>
> In your implementation of VertexReader#getCurrentVertex(), are you 
> providing a new BasicVertex object each time (after nextVertex() is 
> called)?  If you are reusing the same BasicVertex object you could get 
> the problems like the ones you describe.
>
> Avery
>
> On 1/30/12 12:24 AM, David Garcia wrote:
>> Thx for the response Avery. . .unfortunately, I can confirm that I do 
>> not have duplicates in my data.  I have narrowed the problem to the 
>> following method:
>>
>> private VertexEdgeCount readVerticesFromInputSplit(
>>             InputSplit inputSplit) throws IOException, 
>> InterruptedException {
>> .
>> .
>> .
>> while (vertexReader.nextVertex()) {
>>             BasicVertex<I, V, E, M> readerVertex =
>>                 vertexReader.getCurrentVertex();
>> .
>> .
>> .
>> When the .nextVertex() method is called, it automatically mutates 
>> every HashMap in a Partition in the InputSplitCache.  The nature of 
>> the mutation is to convert every Vertex (in the respective partition) 
>> to next vertex resulting from .nextVertex().  (Again, note that the 
>> underlying RecordReader is a SequenceFileRecordReader).  For example, 
>> if I have the following inputSplitCache:
>>
>> inputSplitCache
>> [0]
>> Key -> BasicPartitionOwner. . .
>> Value - > Partition
>> Conf -> Configuration . . .
>> partitionID = 0
>> vertexMap
>> [0] -> 00kK4. . .
>>
>> I have one vertex in my partition. . .assuming that the next vertex 
>> ID is mM424, after vertexReader.nextVertex() is called, the data 
>> structure changes to this. . .
>>
>> inputSplitCache
>> [0]
>> Key -> BasicPartitionOwner. . .
>> Value - > Partition
>> Conf -> Configuration . . .
>> partitionID = 0
>> vertexMap
>> [0] -> mM424. . .
>>
>> After partition.putVertex(. . .) is called, another identical vertex 
>> is added.
>>
>> inputSplitCache
>> [0]
>> Key -> BasicPartitionOwner. . .
>> Value - > Partition
>> Conf -> Configuration . . .
>> partitionID = 0
>> vertexMap
>> [0] -> mM424. . .
>> [1] -> mM424. . .
>>
>> This leads to the error in my previous email. . .All the vertices in 
>> my graph end up with the data of my final Vertex, as this pattern 
>> suggests.  It's almost as if some weird aspectJ is intercepting the 
>> call to .nextVertex().  I'm happy to brandish my code.  I feel it's 
>> fairly simple.  It's just a sequenceFile input format and some 
>> trivial vertex class.
>>
>> -Dave
>>
>> From: Avery Ching <aching@apache.org <mailto:aching@apache.org>>
>> Reply-To: "giraph-user@incubator.apache.org 
>> <mailto:giraph-user@incubator.apache.org>" 
>> <giraph-user@incubator.apache.org 
>> <mailto:giraph-user@incubator.apache.org>>
>> Date: Mon, 30 Jan 2012 01:28:12 -0600
>> To: "giraph-user@incubator.apache.org 
>> <mailto:giraph-user@incubator.apache.org>" 
>> <giraph-user@incubator.apache.org 
>> <mailto:giraph-user@incubator.apache.org>>
>> Subject: Re: Vertex exists error when processing input splits for 
>> Sequence file
>>
>> Hi David,
>>
>> So from the errors, it appears that your input has multiple vertices 
>> with the same vertex id.  Currently we throw an exception to prevent 
>> this from happening as it is typically not what you want.  You 
>> probably want to watch the vertices being processed from the vertex 
>> input format and see why you are getting duplicates.  It's likely to 
>> be either an error with the data actually have vertices with the same 
>> vertex id or an error with your custom vertex input format.
>>
>> To help debug, you might want to add some logging to your record 
>> reader and print the vertex ids or you can add some logging to where 
>> that code is called in BspServiceWorker#readVerticesFromInputSplit().
>>
>> Hope that helps,
>>
>> Avery
>>
>> On 1/29/12 8:13 PM, David Garcia wrote:
>>>
>>>
>>> Hello, I get this error when I try run my job:
>>> 2012-01-29 21:50:18,494
>>>
>>>   INFO or
>>>
>>> g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath = null,
1 of 1 InputSplits are finished.
>>> 2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setup:
Finally loaded a total of (v=0, e=0)
>>> 2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: inputSplitsAllDoneChanged
(all vertices sent from input splits)
>>> 2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: Caught
exception just before end of setup
>>> java.lang.IllegalStateException: moveVerticesToWorker: Vertex Vertex(id=zzYNBgKt2LF6ClLA2eMBzuN7SkA.,value=org.apache.hadoop.io.MapWritable@5ce8787a,#edges=0)
already exists!
>>> 	at org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServiceWorker.java:1389)
>>> 	at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:624)
>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>> .
>>> .
>>> .
>>> I'm not sure where the start debugging. . .BspServiceWorker is hella big.  All
input is welcome.  As I mentioned, I'm processing a sequenceFile that has Text keys and MapWritable
Values.  I would like the vertices to have Text indices and MapWritable values.  (I'm not
inserting any edges for the time being. . .I just want to see the file get split properly).
 I have implemented custom input formats and record readers.  Thx
>>> -Dave
>>
>


Mime
View raw message