spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Fontana (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-3206) Error in PageRank values
Date Mon, 25 Aug 2014 17:14:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Fontana updated SPARK-3206:
---------------------------------

    Description: 
I have found a small example where the PageRank values using run and runUntilConvergence differ
quite a bit.

I am running the Pagerank module on the following graph:

Edge Table:

| Node1  | Node2  |
|1 | 2 |
|1 |	3|
|3 |	2|
|3 |	4|
|5 |	3|
|6 |	7|
|7 |	8|
|8 |	9|
|9 |	7|

Node Table (note the extra node):

| NodeID  | NodeName  |
|a |	1|
|b |	2|
|c |	3|
|d |	4|
|e |	5|
|f |	6|
|g |	7|
|h |	8|
|i |	9|
|j.longaddress.com |	10|

with a default resetProb of 0.15.
When I compute the pageRank with runUntilConvergence, running 

```
 val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices
```
I get the ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,1.3299054047985106)
(9,1.2381240056453071)
(8,1.2803346052504254)
(10,0.15)
(5,0.15)
(2,0.35878124999999994)

However, when I run page Rank with the run() method, running  val ranksI = PageRank.run(graph,100).vertices
I get the page ranks

(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,0.9999999387662847)
(9,0.9999999256447741)
(8,0.9999999256447741)
(10,0.15)
(5,0.15)
(2,0.29503124999999997)

These are quite different, leading me to suspect that one of the PageRank methods is incorrect.
I have examined the source, but I do not know what the correct fix is, or which set of values
is correct.

  was:
I have found a small example where the PageRank values using run and runUntilConvergence differ
quite a bit.

I am running the Pagerank module on the following graph:

Edge Table:

| Node1  | Node2  |
|1 | 2 |
|1 |	3|
3 |	2
3 |	4
5 |	3
6 |	7
7 |	8
8 |	9
9 |	7

Node Table (note the extra node):

| NodeID  | NodeName  |
| ------------- | ------------- |
a |	1
b |	2
c |	3
d |	4
e |	5
f |	6
g |	7
h |	8
i |	9
j.longaddress.com |	10

with a default resetProb of 0.15.
When I compute the pageRank with runUntilConvergence, running  val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices

I get the ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,1.3299054047985106)
(9,1.2381240056453071)
(8,1.2803346052504254)
(10,0.15)
(5,0.15)
(2,0.35878124999999994)

However, when I run page Rank with the run() method, running  val ranksI = PageRank.run(graph,100).vertices
I get the page ranks

(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,0.9999999387662847)
(9,0.9999999256447741)
(8,0.9999999256447741)
(10,0.15)
(5,0.15)
(2,0.29503124999999997)

These are quite different, leading me to suspect that one of the PageRank methods is incorrect.
I have examined the source, but I do not know what the correct fix is, or which set of values
is correct.


> Error in PageRank values
> ------------------------
>
>                 Key: SPARK-3206
>                 URL: https://issues.apache.org/jira/browse/SPARK-3206
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.0.2
>         Environment: UNIX with Hadoop
>            Reporter: Peter Fontana
>
> I have found a small example where the PageRank values using run and runUntilConvergence
differ quite a bit.
> I am running the Pagerank module on the following graph:
> Edge Table:
> | Node1  | Node2  |
> |1 | 2 |
> |1 |	3|
> |3 |	2|
> |3 |	4|
> |5 |	3|
> |6 |	7|
> |7 |	8|
> |8 |	9|
> |9 |	7|
> Node Table (note the extra node):
> | NodeID  | NodeName  |
> |a |	1|
> |b |	2|
> |c |	3|
> |d |	4|
> |e |	5|
> |f |	6|
> |g |	7|
> |h |	8|
> |i |	9|
> |j.longaddress.com |	10|
> with a default resetProb of 0.15.
> When I compute the pageRank with runUntilConvergence, running 
> ```
>  val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices
> ```
> I get the ranks
> (4,0.29503124999999997)
> (1,0.15)
> (6,0.15)
> (3,0.34124999999999994)
> (7,1.3299054047985106)
> (9,1.2381240056453071)
> (8,1.2803346052504254)
> (10,0.15)
> (5,0.15)
> (2,0.35878124999999994)
> However, when I run page Rank with the run() method, running  val ranksI = PageRank.run(graph,100).vertices
I get the page ranks
> (4,0.29503124999999997)
> (1,0.15)
> (6,0.15)
> (3,0.34124999999999994)
> (7,0.9999999387662847)
> (9,0.9999999256447741)
> (8,0.9999999256447741)
> (10,0.15)
> (5,0.15)
> (2,0.29503124999999997)
> These are quite different, leading me to suspect that one of the PageRank methods is
incorrect. I have examined the source, but I do not know what the correct fix is, or which
set of values is correct.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message