Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 98FF8200B2A for ; Fri, 20 May 2016 16:20:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 979ED1609B1; Fri, 20 May 2016 14:20:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E74C2160A24 for ; Fri, 20 May 2016 16:20:13 +0200 (CEST) Received: (qmail 52727 invoked by uid 500); 20 May 2016 14:20:13 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 52681 invoked by uid 99); 20 May 2016 14:20:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 May 2016 14:20:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id EAE852C1F6A for ; Fri, 20 May 2016 14:20:12 +0000 (UTC) Date: Fri, 20 May 2016 14:20:12 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-3780) Jaccard Similarity MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 20 May 2016 14:20:14 -0000 [ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293429#comment-15293429 ] ASF GitHub Bot commented on FLINK-3780: --------------------------------------- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048334 --- Diff: docs/apis/batch/libs/gelly.md --- @@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new LongValueAddOffset(vertexCount))); - translate.
TranslateEdgeValues + asm.translate.
TranslateEdgeValues

Translate edge values using the given TranslateFunction.

{% highlight java %} graph.run(new TranslateEdgeValues(new Nullify())); {% endhighlight %} + + + library.similarity.
JaccardIndex + +

Measures the similarity between vertex neighborhoods. The Jaccard Index score is computed as the number of shared numbers divided by the number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are shared).

--- End diff -- Why did you add this here and not in the "Usage" section of the library method? I find it a bit confusing... You describe graph algorithms as building blocks for other algorithms. Does Jaccard index fall in this category? > Jaccard Similarity > ------------------ > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly > Affects Versions: 1.1.0 > Reporter: Greg Hogan > Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity scores. This algorithm is similar to {{TriangleListing}} but instead of joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which relies on {{Graph.getTriplets()}} so only computes similarity scores for neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as Adamic-Adar similarity where the sum of endpoint degrees is replaced by the degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)