incubator-hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Trivial Update of "TraditionalCollaborativeFiltering" by udanax
Date Wed, 03 Sep 2008 03:05:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

------------------------------------------------------------------------------
+ [[TableOfContents(4)]]
+ ----
+ 
+ == Abstract ==
+ 
+ Collaborative filtering is an important personalized method in recommender systems in internet
commerce. It is infeasible that traditional collaborative filtering is based on absolute rating
for items since users are difficult to accurately make an absolute rating for items, and also
different users give different rating distribution. In this tutorial, it shows that how to
use a Hama to calculate TCF.
+ 
+ == Implementation ==
+ 
+ === Build a user by item matrix, Set entries from the raw data ===
+ 
+ ...
+ 
+ === Get the pairs of all row key combinations w/o repetition ===
+ We don't have to recalculate the same value pair with reversed order.[[BR]]
+ ~-ex) similar(UserA, UserB) == similar(UserB, UserA).-~
+ 
+ In this case, it is going to return {{1, 2}, {1, 3}, {2, 3}} by discarding {{2, 1}, {3,
1}, {3, 2}} from the full possible combination. Since there will be mC2 combinations (m :
num keys), one can optimize it to have mC2 / N values per reducer (N : num-reducers). Something
like :
+ 
+ {{{
+ partition(index i, key key_j, int N) { // N is num reducers
+  // find the data per reducer
+  int dataPerRed = mC2 / N; // assuming m is known
+  int prev_sum = 0;
+  // calculate the total combinations contributed by previous indexes
+  for (k=1; k < i; k++) {
+   prev_sum += m - k + 1; // this adds the number of combinations contributed by kth index
+  }
+  prev_sum += j - i + 1 // self contribution
+  return prev_sum % dataPerRed
+ }
+ }}}
+ 
+ === |a|·|b|cos(q) calculation ===
+ 
+ ...
+ 
+ === Collect the similarity result of the two users ===
+ 
+ ...
+ 
+ == Pseudo code for TCF with Hama ==
+ 
+ 
  {{{
  import java.math.BigInteger;
  
@@ -26, +70 @@

      }
  
      // 2. Get the pair set of all row key combinations
-     //  So, we don't have to recalculate the same value pair with reversed order.
-     //  ex) similar(UserA, UserB) == similar(UserB, UserA)
-     //  In this case, it is going to return {{1, 2}, {1, 3}, {2, 3}}
-     //   by discarding {{2, 1}, {3, 1}, {3, 2}} from the full possible combination.
      Combination x = new Combination(data.length, 2);
      
      // 3. |a|·|b|cos(q) calculation

Mime
View raw message