mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Hui Wen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAHOUT-478) Do we need normalize SimilarityMatrixEntryKey?
Date Fri, 13 Aug 2010 13:36:21 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Han Hui Wen  updated MAHOUT-478:
--------------------------------

    Description: 
In org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey
{code}
public static class SimilarityMatrixEntryKeyComparator extends WritableComparator {

    protected SimilarityMatrixEntryKeyComparator() {
      super(SimilarityMatrixEntryKey.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
      SimilarityMatrixEntryKey key1 = (SimilarityMatrixEntryKey) a;
      SimilarityMatrixEntryKey key2 = (SimilarityMatrixEntryKey) b;

      int result = compare(key1.row, key2.row);
      if (result == 0) {
        result = -1 * compare(key1.value, key2.value);
      }
      return result;
    }

    protected static int compare(long a, long b) {
      return (a == b) ? 0 : (a < b) ? -1 : 1;
    }

    protected static int compare(double a, double b) {
      return (a == b) ? 0 : (a < b) ? -1 : 1;
    }
  }
{code}

We used double as one part of the key, 
because of double has many possible value ,it will cause pairwiseSimilarity may has may group,
the count of group also is out of our control.
for example (ItemA ,0.1),(ItemA ,0.11),(ItemA ,0.01),(ItemA ,0.1),(ItemA ,0.001),(ItemA ,0.0011)
is different group.
Also double is inaccurate,it hard to compare the equal of double .

So can we normalize the similarityValue ?
multiply all similarityValue  with 100,1000 ,or other numer,and make it to a integer.

If necessary we can transform them to double in the end.
 


  was:
In org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey
{code}
public static class SimilarityMatrixEntryKeyComparator extends WritableComparator {

    protected SimilarityMatrixEntryKeyComparator() {
      super(SimilarityMatrixEntryKey.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
      SimilarityMatrixEntryKey key1 = (SimilarityMatrixEntryKey) a;
      SimilarityMatrixEntryKey key2 = (SimilarityMatrixEntryKey) b;

      int result = compare(key1.row, key2.row);
      if (result == 0) {
        result = -1 * compare(key1.value, key2.value);
      }
      return result;
    }

    protected static int compare(long a, long b) {
      return (a == b) ? 0 : (a < b) ? -1 : 1;
    }

    protected static int compare(double a, double b) {
      return (a == b) ? 0 : (a < b) ? -1 : 1;
    }
  }
{code}

We used double as one part of the key, 
because of double has many possible value ,it will cause pairwiseSimilarity may has may group,
for example (ItemA ,0.1),(ItemA ,0.11),(ItemA ,0.01),(ItemA ,0.1),(ItemA ,0.001),(ItemA ,0.0011)
is different group.
Also double is inaccurate,it hard to compare the equal of double .

So can we normalize the similarityValue ?
multiply all similarityValue  with 100,1000 ,or other numer,and make it to a integer.

If necessary we can transform them to double in the end.
 



> Do we need  normalize SimilarityMatrixEntryKey?
> -----------------------------------------------
>
>                 Key: MAHOUT-478
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-478
>             Project: Mahout
>          Issue Type: Question
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>             Fix For: 0.4
>
>
> In org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey
> {code}
> public static class SimilarityMatrixEntryKeyComparator extends WritableComparator {
>     protected SimilarityMatrixEntryKeyComparator() {
>       super(SimilarityMatrixEntryKey.class, true);
>     }
>     @Override
>     public int compare(WritableComparable a, WritableComparable b) {
>       SimilarityMatrixEntryKey key1 = (SimilarityMatrixEntryKey) a;
>       SimilarityMatrixEntryKey key2 = (SimilarityMatrixEntryKey) b;
>       int result = compare(key1.row, key2.row);
>       if (result == 0) {
>         result = -1 * compare(key1.value, key2.value);
>       }
>       return result;
>     }
>     protected static int compare(long a, long b) {
>       return (a == b) ? 0 : (a < b) ? -1 : 1;
>     }
>     protected static int compare(double a, double b) {
>       return (a == b) ? 0 : (a < b) ? -1 : 1;
>     }
>   }
> {code}
> We used double as one part of the key, 
> because of double has many possible value ,it will cause pairwiseSimilarity may has may
group,
> the count of group also is out of our control.
> for example (ItemA ,0.1),(ItemA ,0.11),(ItemA ,0.01),(ItemA ,0.1),(ItemA ,0.001),(ItemA
,0.0011) is different group.
> Also double is inaccurate,it hard to compare the equal of double .
> So can we normalize the similarityValue ?
> multiply all similarityValue  with 100,1000 ,or other numer,and make it to a integer.
> If necessary we can transform them to double in the end.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message