flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andra Lungu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2361) flatMap + distict gives eroneous results for big data sets
Date Tue, 14 Jul 2015 21:31:05 GMT
Andra Lungu created FLINK-2361:

             Summary: flatMap + distict gives eroneous results for big data sets
                 Key: FLINK-2361
                 URL: https://issues.apache.org/jira/browse/FLINK-2361
             Project: Flink
          Issue Type: Bug
          Components: Gelly
    Affects Versions: 0.10
            Reporter: Andra Lungu

When running the simple Connected Components algorithm (currently in Gelly) on the twitter
follower graph, with 1, 100 or 10000 iterations, I get the following error:

Caused by: java.lang.Exception: Target vertex '657282846' does not exist!.
	at org.apache.flink.graph.spargel.VertexCentricIteration$VertexUpdateUdfSimpleVV.coGroup(VertexCentricIteration.java:300)
	at org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:220)
	at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
	at org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
	at org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:107)
	at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
	at java.lang.Thread.run(Thread.java:722)

Now this is very bizzare as the DataSet of vertices is produced from the DataSet of edges...
Which means there cannot be a an edge with an invalid target id... The method calls flatMap
to isolate the src and trg ids and distinct to ensure their uniqueness.  

The algorithm works fine for smaller data sets... 

This message was sent by Atlassian JIRA

View raw message