flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3780) Jaccard Similarity
Date Thu, 19 May 2016 14:41:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291192#comment-15291192

ASF GitHub Bot commented on FLINK-3780:

Github user vasia commented on a diff in the pull request:

    --- Diff: docs/apis/batch/libs/gelly.md ---
    @@ -2051,6 +2052,26 @@ The algorithm takes a directed, vertex (and possibly edge) attributed
graph as i
     vertex represents a group of vertices and each edge represents a group of edges from
the input graph. Furthermore, each
     vertex and edge in the output graph stores the common group value and the number of represented
    +### Jaccard Index
    +#### Overview
    +The Jaccard Index measures the similarity between vertex neighborhoods. Scores range
from 0.0 (no common neighbors) to
    +1.0 (all neighbors are common).
    +#### Details
    +Counting common neighbors for pairs of vertices is equivalent to counting the two-paths
consisting of two edges
    --- End diff --
    By "two-paths" you mean triads? i.e. open triangles?

> Jaccard Similarity
> ------------------
>                 Key: FLINK-3780
>                 URL: https://issues.apache.org/jira/browse/FLINK-3780
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>             Fix For: 1.1.0
> Implement a Jaccard Similarity algorithm computing all non-zero similarity scores. This
algorithm is similar to {{TriangleListing}} but instead of joining two-paths against an edge
list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which relies on {{Graph.getTriplets()}}
so only computes similarity scores for neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as Adamic-Adar similarity
where the sum of endpoint degrees is replaced by the degree of the middle vertex.

This message was sent by Atlassian JIRA

View raw message