flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: Apache Flink 1.1.4 - Gelly - LocalClusteringCoefficient - Returning values above 1?
Date Fri, 20 Jan 2017 19:06:31 GMT
Hi Miguel,

the LocalClusteringCoefficient algorithm returns a DataSet of type Result,
which basically wraps a vertex id, its degree, and the number of triangles
containing this vertex. The number 11 you see is indeed the degree of
vertex 5113. The Result type contains the method
getLocalClusteringCoefficientScore() which allows you to retrieve the
clustering coefficient score for a vertex. The method simply divides the
numbers of triangles by the number of potential edges between neighbors.

I'm sorry that you this is not clear in the docs. We should definitely
improve them to explain what is the output and how to retrieve the actual
clustering coefficient values. I have opened a JIRA for this [1].

Cheers,
-Vasia.

[1]: https://issues.apache.org/jira/browse/FLINK-5597

On 20 January 2017 at 19:31, Miguel Coimbra <miguel.e.coimbra@gmail.com>
wrote:

> Hello,
>
> In the documentation of the LocalClusteringCoefficient algorithm, it is
> said:
>
>
> *The local clustering coefficient measures the connectedness of each
> vertex’s neighborhood.Scores range from 0.0 (no edges between neighbors) to
> 1.0 (neighborhood is a clique).*
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.1/
> apis/batch/libs/gelly.html#local-clustering-coefficient
> <https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/gelly/library_methods.html#local-clustering-coefficient>
>
> However, upon running the algorithm (undirected version), I obtained
> values above 1.
>
> The result I got was this. As you can see, vertex 5113 has a score of 11:
> (the input edges for the graph are shown further below - around *35 edges*
> ):
>
> (4907,(1,0))
> *(5113,(11,0))*
> (6008,(0,0))
> (6064,(1,0))
> (6065,(1,0))
> (6107,(0,0))
> (6192,(0,0))
> (6252,(1,0))
> (6279,(1,0))
> (6465,(1,0))
> (6545,(0,0))
> (6707,(1,0))
> (6715,(1,0))
> (6774,(0,0))
> (7088,(0,0))
> (7089,(1,0))
> (7171,(0,0))
> (7172,(1,0))
> (7763,(0,0))
> (7976,(1,0))
> (8056,(1,0))
> (9748,(1,0))
> (10191,(1,0))
> (10370,(1,0))
> (10371,(1,0))
> (14310,(1,0))
> (16785,(1,0))
> (19801,(1,0))
> (26284,(1,0))
> (26562,(0,0))
> (31724,(1,0))
> (32443,(1,0))
> (32938,(0,0))
> (33855,(1,0))
> (37929,(0,0))
>
> This was from a small isolated test with these edges:
>
> 5113    6008
> 5113    6774
> 5113    32938
> 5113    6545
> 5113    7088
> 5113    37929
> 5113    26562
> 5113    6107
> 5113    7171
> 5113    6192
> 5113    7763
> 9748    5113
> 10191    5113
> 6064    5113
> 6065    5113
> 6279    5113
> 4907    5113
> 6465    5113
> 6707    5113
> 7089    5113
> 7172    5113
> 14310    5113
> 6252    5113
> 33855    5113
> 7976    5113
> 26284    5113
> 8056    5113
> 10371    5113
> 16785    5113
> 19801    5113
> 6715    5113
> 31724    5113
> 32443    5113
> 10370    5113
>
> I am not sure what I may be doing wrong, but is there perhaps some form of
> normalization lacking in my execution of:
>
> org.apache.flink.graph.library.clustering.undirected.
> LocalClusteringCoefficient.Result;
> org.apache.flink.graph.library.clustering.undirected.
> LocalClusteringCoefficient;
>
> Am I supposed to divide all scores by the greatest score obtained by the
> algorithm?
>
> Thank you very much!
>
> Miguel E. Coimbra
> Email: miguel.e.coimbra@gmail.com <miguel.e.coimbra@ist.utl.pt>
> Skype: miguel.e.coimbra
>

Mime
View raw message