spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3190) Creation of large graph(> 2.15 B nodes) seems to be broken:possible overflow somewhere
Date Sun, 24 Aug 2014 01:14:10 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108214#comment-14108214
] 

Apache Spark commented on SPARK-3190:
-------------------------------------

User 'ankurdave' has created a pull request for this issue:
https://github.com/apache/spark/pull/2106

> Creation of large graph(> 2.15 B nodes) seems to be broken:possible overflow somewhere

> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-3190
>                 URL: https://issues.apache.org/jira/browse/SPARK-3190
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.0.3
>         Environment: Standalone mode running on EC2 . Using latest code from master branch
upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 .
>            Reporter: npanj
>            Priority: Critical
>
> While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api
returns incorrect result; 'numEdges' reports correct number. For few times(with different
dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number;
so I suspect that there is some overflow (may be we are using Int for some field?).
> Here is some details of experiments  I have done so far: 
> 1. Input: numNodes=6101995593 ; noEdges=12163784626
>    Graph returns: numVertices=1807028297 ;  numEdges=12163784626
> 2. Input : numNodes=2157586441 ; noEdges=2747322705
>    Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705
> 3. Input: numNodes=1725060105 ; noEdges=204176821
>    Graph: numVertices=1725060105 ;  numEdges=2041768213
> You can find the code to generate this bug here: 
> https://gist.github.com/npanj/92e949d86d08715bf4bf
> Note: Nodes are labeled are 1...6B .
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message