flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Hogan <c...@greghogan.com>
Subject Re: Gelly PageRank implementations in 1.2 to 1.3
Date Mon, 24 Jul 2017 14:36:46 GMT
The current algorithm is unweighted though we should definitely look to add a weighted variant
and consider PersonalizedPageRank as well.

Looking at your results, PageRank scores should sum to 1.0, should be positive unless the
damping factor is 1.0, and use of the convergence threshold will guarantee accurate results
on large graphs.

The PageRank tests compare results from the NetworkX implementation. The missing vertex 3
is trivially fixed by adding the call ".setIncludeZeroDegreeVertices(true)” to the VertexDegrees
function.


> On Jul 23, 2017, at 6:38 AM, Kaepke, Marc <marc.kaepke@haw-hamburg.de> wrote:
> 
> Hi Greg,
> 
> I do an evaluation between Gelly and GraphX (Spark). Both frameworks implement PageRank
and Gelly provides a lot of variants (*thumbs up*).
> During a really small initial test I get for the vertex-centric, scatter-gather and gsa
version the same ranking result. Just the implementation in 1.3.X (without any graph model)
computed a different result (ranking).
> 
> /* vertex centric */
> DataSet<Vertex<Double, Double>> pagerankVC = small.run(new PageRank<>(0.5,
10));
> System.err.println("VC");
> pagerankVC.printToErr();
> 
> /* scatter gather */
> DataSet<Vertex<Double, Double>> pageRankSG = small
>     .run(new org.apache.flink.graph.library.PageRank<>(0.5, 10));
> System.err.println("SG");
> pageRankSG.printToErr();
> 
> /* gsa */
> DataSet<Vertex<Double, Double>> pageRankGSA = small.run(new GSAPageRank<>(0.5,
10));
> System.err.println("GSA");
> pageRankGSA.printToErr();
> 
> /* without graph model */
> DataSet<Result<Double>> pageRankDI = small
>     .run(new PageRank<>(0.5, 10));
> System.err.println("delta iteration");
> pageRankDI.printToErr();
> My input graph is:
> vertices
> id 1, val 0
> id 2, val 0
> id 3, val 0
> id 4, val 0
> edges
> src 1, trg 2, val 3
> src 1, trg 1, val 2
> src 2, trg 1, val 3
> src 2, trg 4, val 6
> 
> Ranking output
> vertex-centric
> id 4 with 1.16
> id 1 with 1.103
> id 2 with 0.815
> id 3 with 0
> sg and gsa
> id 4 with 2.208
> id 1 with 2.114
> id 2 with 1.546
> id 3 with 0
> new PageRank in Gelly 1.3.X
> id 1 with 0.392
> id 2 with 0.313
> id 4 with 0.294
> 
> Do you know why?
> 
> 
> Best
> Marc
> 
> 
>> Am 23.07.2017 um 02:22 schrieb Greg Hogan <code@greghogan.com <mailto:code@greghogan.com>>:
>> 
>> Hi Marc,
>> 
>> PageRank and GSAPageRank were moved to the flink-gelly-examples jar in the org.apache.flink.graph.examples
package. A library algorithm was added that supports both source and sink vertices. This limitation
of the old algorithms was noted in the class documentation and I understand to be an effect
of delta iterations. The new implementation is also significantly faster (https://github.com/apache/flink/pull/2733#issuecomment-278789830
<https://github.com/apache/flink/pull/2733#issuecomment-278789830>).
>> 
>> PageRank can be run using the examples jar from the command line, for example (don’t
wildcard the jar file as in the documentation until we get the javadoc jar removed from the
next release).
>> 
>> $ mv opt/flink-gelly* lib/
>> $ ./bin/flink run examples/gelly/flink-gelly-examples_2.11-1.3.1.jar \
>>     --algorithm PageRank \
>>     --input CSV --type integer --simplify directed --input_filename <filename>
--input_field_delimiter $'\t' \
>>     --output print
>> 
>> The output can also be written to CSV in similar fashion to the input.
>> 
>> The code to call the library PageRank from the examples driver is as with any GraphAlgorithm
(https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java
<https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java>):
>> 
>> graph.run(new PageRank<K, VV, EV>(dampingFactor, iterations,  convergenceThreshold));
>> 
>> Please let us know of any issues or additional questions!
>> 
>> Greg
>> 
>> 
>>> On Jul 22, 2017, at 4:33 PM, Kaepke, Marc <marc.kaepke@haw-hamburg.de <mailto:marc.kaepke@haw-hamburg.de>>
wrote:
>>> 
>>> Hi there,
>>> 
>>> why was the PageRank version (which implements the GraphAlgorithm interface)
removed in 1.3?
>>> 
>>> How can I use the new PageRank implementation in 1.3.x?
>>> 
>>> Why PageRank doesn’t use the graph processing models (vertex-centric, sg or
gsa) anymore?
>>> 
>>> Thanks!
>>> 
>>> Bests,
>>> marc

Mime
View raw message