spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khaled Ammar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10945) GraphX computes Pagerank with NaN (with some datasets)
Date Wed, 07 Oct 2015 16:54:26 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947202#comment-14947202
] 

Khaled Ammar commented on SPARK-10945:
--------------------------------------

One more followup,

In order to further investigate why vertex 38 shows up with zero degree, in the previous comment:
edge "38 0 37455831 47 Infinity". I ran the following code:

{code}
          graph.outerJoinVertices(graph.outDegrees)
                           {(vid,vdata,deg) => deg.getOrElse(0)}
                           .vertices.map {case (vid, vdata) => vid + " " + vdata }
                           .saveAsTextFile(outFname + "_degree_joined")

          graph.outerJoinVertices(graph.outDegrees)
                           {(vid,vdata,deg) => deg.getOrElse(0)}
                           .triplets.map { e => e.srcId + " " + e.srcAttr }
                           .saveAsTextFile(outFname + "_degree_byTriplets")

{code}

In the "_degree_joined" file, 38 shows up with degree 1.
In the "_degree_byTriplets" file, 38 shows up with degree 0.

This may imply that using e.srcAttr to find the degree of the source vertex is not accurate.


> GraphX computes Pagerank with NaN (with some datasets)
> ------------------------------------------------------
>
>                 Key: SPARK-10945
>                 URL: https://issues.apache.org/jira/browse/SPARK-10945
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.3.0
>         Environment: Linux
>            Reporter: Khaled Ammar
>              Labels: test
>
> Hi,
> I run GraphX in a medium size standalone Spark 1.3.0 installation. The pagerank typically
works fine, except with one dataset (Twitter: http://law.di.unimi.it/webdata/twitter-2010).
This is a public dataset that is commonly used in research papers.
> I found that many vertices have an NaN values. This is true, even if the algorithm run
for 1 iteration only.  
> Thanks,
> -Khaled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message