[ https://issues.apache.org/jira/browse/FLINK3780?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=15293438#comment15293438
]
ASF GitHub Bot commented on FLINK3780:

Github user vasia commented on a diff in the pull request:
https://github.com/apache/flink/pull/1980#discussion_r64049343
 Diff: docs/apis/batch/libs/gelly.md 
@@ 2055,22 +2055,22 @@ vertex and edge in the output graph stores the common group value
and the number
### Jaccard Index
#### Overview
The Jaccard Index measures the similarity between vertex neighborhoods. Scores range
from 0.0 (no common neighbors) to
1.0 (all neighbors are common).
+The Jaccard Index measures the similarity between vertex neighborhoods and is computed
as the number of shared numbers
+divided by the number of distinct neighbors. Scores range from 0.0 (no shared neighbors)
to 1.0 (all neighbors are
+shared).
#### Details
Counting common neighbors for pairs of vertices is equivalent to counting the twopaths
consisting of two edges
connecting the two vertices to the common neighbor. The number of distinct neighbors
for pairs of vertices is computed
by storing the sum of degrees of the vertex pair and subtracting the count of common
neighbors, which are doublecounted
in the sum of degrees.
+Counting shared neighbors for pairs of vertices is equivalent to counting connecting
paths of length two. The number of
+distinct neighbors is computed by storing the sum of degrees of the vertex pair and subtracting
the count of shared
+neighbors, which are doublecounted in the sum of degrees.
The algorithm first annotates each edge with the endpoint degree. Grouping on the midpoint
vertex, each pair of
neighbors is emitted with the endpoint degree sum. Grouping on twopaths, the common
neighbors are counted.
+The algorithm first annotates each edge with the target vertex's degree. Grouping on
the source vertex, each pair of
+neighbors is emitted with the degree sum. Grouping on twopaths, the shared neighbors
are counted.
#### Usage
The algorithm takes a simple, undirected graph as input and outputs a `DataSet` of tuples
containing two vertex IDs,
the number of common neighbors, and the number of distinct neighbors. The graph ID type
must be `Comparable` and
`Copyable`.
+the number of shared neighbors, and the number of distinct neighbors. The result class
provides a method to compute the
+Jaccard Index score. The graph ID type must be `Comparable` and `Copyable`.
 End diff 
Here we should also document what is the output of the algorithm, i.e. the `Result` type
and how to get the jaccard similarity out of it.
> Jaccard Similarity
> 
>
> Key: FLINK3780
> URL: https://issues.apache.org/jira/browse/FLINK3780
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all nonzero similarity scores. This
algorithm is similar to {{TriangleListing}} but instead of joining twopaths against an edge
list we count twopaths.
> {{flinkgellyexamples}} currently has {{JaccardSimilarityMeasure}} which relies on {{Graph.getTriplets()}}
so only computes similarity scores for neighbors but not neighborsofneighbors.
> This algorithm is easily modified for other similarity scores such as AdamicAdar similarity
where the sum of endpoint degrees is replaced by the degree of the middle vertex.

This message was sent by Atlassian JIRA
(v6.3.4#6332)
