spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Picard <jpic...@placeiq.com>
Subject Re: Graphx seems to be broken while Creating a large graph(6B nodes in my case)
Date Sat, 23 Aug 2014 06:42:55 GMT
I’m seeing this issue also. I have graph with with 5828339535 vertices and 7398447992 edges,
graph.numVertices returns 1533266498 and graph.numEdges is correct and returns 7398447992.
I also am having an issue that I’m beginning to suspect is caused by the same underlying
problem where connected components stops after one iteration, returning an incorrect graph.
On Aug 22, 2014, at 8:43 PM, npanj <nitinpanj@gmail.com> wrote:

> While creating a graph with 6B nodes and 12B edges, I noticed that
> *'numVertices' api returns incorrect result*; 'numEdges' reports correct
> number. For few times(with different dataset > 2.5B nodes) I have also
> notices that numVertices is returned as -ive number; so I suspect that there
> is some overflow (may be we are using Int for some field?).
> 
> Environment: Standalone mode running on EC2 . Using latest code from master
> branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 .
> 
> Here is some details of experiments I have done so far: 
> 1. Input: numNodes=6101995593 ; noEdges=12163784626
> Graph returns: numVertices=1807028297 ; numEdges=12163784626
> 2. Input : numNodes=*2157586441* ; noEdges=2747322705
> Graph Returns: numVertices=*-2137380855* ; numEdges=2747322705
> 3. Input: numNodes=1725060105 ; noEdges=204176821
> Graph: numVertices=1725060105 ; numEdges=2041768213 
> 
> 
> You can find the code to generate this bug here:
> https://gist.github.com/npanj/92e949d86d08715bf4bf
> 
> (I have also filed this jira ticket:
> https://issues.apache.org/jira/browse/SPARK-3190)
> 
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Graphx-seems-to-be-broken-while-Creating-a-large-graph-6B-nodes-in-my-case-tp7966.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 


Mime
View raw message