mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saikat Kanjilal (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAHOUT-1539) Implement affinity matrix computation in Mahout DSL
Date Thu, 02 Apr 2015 05:38:53 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392179#comment-14392179
] 

Saikat Kanjilal edited comment on MAHOUT-1539 at 4/2/15 5:38 AM:
-----------------------------------------------------------------

Enough with high level concepts already :), so I took the next logical step:

I'm not ready to include my code into the mahout master repo yet, so I created my own repo
and started a sample implementation there, you will see a first cut of LocalitySensitiveHashing
implemented using Euclidean Distance only, code is at least compiling as a first step:

https://github.com/skanjila/AffinityMatrix


TBD
1) Implement unit and potentially integration tests to test performance of this
2) Once LSH is all the way tested I will then implement the affinityMatrix piece on top of
this
3) I will then add some more unit tests for Affinitymatrix
4) I will then add CosineDistance and ManhattanDistance as configurable parameters
5) I will need to incorporate into spark API specifically invoking the SparkContext and using
the broadcast mechanisms in the spark clusters as appropriate
6) I will merge this into my mahout checkout out branch


Some early feedback on the code would be greatly appreciated, watch for changes in my repo
coming frequently


was (Author: kanjilal):
Enough with high level concepts already :), so I took the next logical step:

I'm not ready to include my code into the mahout master repo yet, so I created my own repo
and started a sample implementation there, you will see a first cut of LocalitySensitiveHashing
implemented using Euclidean Distance only, code is at least compiling as a first step:

https://github.com/skanjila/AffinityMatrix


TBD
1) Implement unit and potentially integration tests to test performance of this
2) Once LSH is all the way tested I will then implement the affinityMatrix piece on top of
this
3) I will then add some more unit tests for Affinitymatrix
4) I will then add CosineDistance and ManhattanDistance as configurable parameters
5) I will need to incorporate into spark API specifically invoking the SparkContext and using
the broadcast mechanisms in the spark clusters as appropriate
5) I will merge this into my mahout checkout out branch


Some early feedback on the code would be greatly appreciated, watch for changes in my repo
coming frequently

> Implement affinity matrix computation in Mahout DSL
> ---------------------------------------------------
>
>                 Key: MAHOUT-1539
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1539
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.9
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>              Labels: DSL, scala, spark
>             Fix For: 0.10.1
>
>         Attachments: ComputeAffinities.scala
>
>
> This has the same goal as MAHOUT-1506, but rather than code the pairwise computations
in MapReduce, this will be done in the Mahout DSL.
> An orthogonal issue is the format of the raw input (vectors, text, images, SequenceFiles),
and how the user specifies the distance equation and any associated parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message