mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Schelter (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-478) Do we need normalize SimilarityMatrixEntryKey?
Date Fri, 13 Aug 2010 20:35:18 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898408#action_12898408
] 

Sebastian Schelter commented on MAHOUT-478:
-------------------------------------------

I'd say so too.

> Do we need  normalize SimilarityMatrixEntryKey?
> -----------------------------------------------
>
>                 Key: MAHOUT-478
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-478
>             Project: Mahout
>          Issue Type: Question
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>
> In org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey
> {code}
> public static class SimilarityMatrixEntryKeyComparator extends WritableComparator {
>     protected SimilarityMatrixEntryKeyComparator() {
>       super(SimilarityMatrixEntryKey.class, true);
>     }
>     @Override
>     public int compare(WritableComparable a, WritableComparable b) {
>       SimilarityMatrixEntryKey key1 = (SimilarityMatrixEntryKey) a;
>       SimilarityMatrixEntryKey key2 = (SimilarityMatrixEntryKey) b;
>       int result = compare(key1.row, key2.row);
>       if (result == 0) {
>         result = -1 * compare(key1.value, key2.value);
>       }
>       return result;
>     }
>     protected static int compare(long a, long b) {
>       return (a == b) ? 0 : (a < b) ? -1 : 1;
>     }
>     protected static int compare(double a, double b) {
>       return (a == b) ? 0 : (a < b) ? -1 : 1;
>     }
>   }
> {code}
> We used double as one part of the key, 
> because of double has many possible value ,it will cause pairwiseSimilarity may has may
group,
> the count of group also is out of our control.
> for example (ItemA ,0.1),(ItemA ,0.11),(ItemA ,0.01),(ItemA ,0.1),(ItemA ,0.001),(ItemA
,0.0011) is different group.
> Also double is inaccurate,it hard to compare the equal of double .
> So can we normalize the similarityValue ?
> multiply all similarityValue  with 100,1000 ,or other numer,and make it to a integer.
> If necessary we can transform them to double in the end.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message