mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Schelter (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAHOUT-803) Complete minsize constraints for similarity measures used in RowSimilarityJob
Date Fri, 09 Sep 2011 08:03:08 GMT
Complete minsize constraints for similarity measures used in RowSimilarityJob
-----------------------------------------------------------------------------

                 Key: MAHOUT-803
                 URL: https://issues.apache.org/jira/browse/MAHOUT-803
             Project: Mahout
          Issue Type: Task
          Components: Math
    Affects Versions: 0.6
            Reporter: Sebastian Schelter
            Assignee: Sebastian Schelter


The latest implementation of RowSimilarityJob allows specifying a threshold for the minimum
similarity value of the resulting row pairs.

A measure can specify a minsize constraints via VectorSimilarityMeasure.consider(...) to prune
some candidate pairs very early by looking at some statistics computed for the single rows.

For example if cooccurrence count is used as similarity measure and a threshold of 5 is set,
then all row pairs where one of the vectors has less than 5 non-zero components can be discarded.

These min-size constraints are still missing for CityBlockSimilarity, LoglikelihoodSimilarity
and EuclideanDistanceSimilarity

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message