flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3780) Jaccard Similarity
Date Fri, 20 May 2016 14:20:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293429#comment-15293429

ASF GitHub Bot commented on FLINK-3780:

Github user vasia commented on a diff in the pull request:

    --- Diff: docs/apis/batch/libs/gelly.md ---
    @@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new LongValueAddOffset(vertexCount)));
    -      <td>translate.<br/><strong>TranslateEdgeValues</strong></td>
    +      <td>asm.translate.<br/><strong>TranslateEdgeValues</strong></td>
             <p>Translate edge values using the given <code>TranslateFunction</code>.</p>
     {% highlight java %}
     graph.run(new TranslateEdgeValues(new Nullify()));
     {% endhighlight %}
    +    <tr>
    +      <td>library.similarity.<br/><strong>JaccardIndex</strong></td>
    +      <td>
    +        <p>Measures the similarity between vertex neighborhoods. The Jaccard Index
score  is computed as the number of shared numbers divided by the number of distinct neighbors.
Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are shared).</p>
    --- End diff --
    Why did you add this here and not in the "Usage" section of the library method?
    I find it a bit confusing... You describe graph algorithms as building blocks for other
algorithms. Does Jaccard index fall in this category?

> Jaccard Similarity
> ------------------
>                 Key: FLINK-3780
>                 URL: https://issues.apache.org/jira/browse/FLINK-3780
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>             Fix For: 1.1.0
> Implement a Jaccard Similarity algorithm computing all non-zero similarity scores. This
algorithm is similar to {{TriangleListing}} but instead of joining two-paths against an edge
list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which relies on {{Graph.getTriplets()}}
so only computes similarity scores for neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as Adamic-Adar similarity
where the sum of endpoint degrees is replaced by the degree of the middle vertex.

This message was sent by Atlassian JIRA

View raw message