mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: TreeBasedRecommenders(Deprecated?)
Date Wed, 11 Jun 2014 17:45:37 GMT
> 
> On Jun 10, 2014, at 6:13 PM, Sahil Sharma <ssahil08@gmail.com> wrote:
> 
> Hi,
> 
> Better yet (maybe controversial since I don’t know the mathematical justification for
this) but you could cluster the indicator matrix of items by similar items. This is at least
clustering “important” similar items.
> 
> I'm sorry if what I said was interpreted as user-based clustering, I meant Item-based
clustering, like you pointed out !

The indicator matrix is item by item, so this will cluster items.

If you want to cluster the input matrix by item, just transpose it and cluster. The rows will
be items and the columns users. so you will get clustered items. I’m dubious of this because
of how sparse it is and you’ll also have uninteresting interactions because is hasn’t
been scrubbed by the process that creates the indicator matrix (RowSimilarityJob). 

Still I'd shy away from clustering. It’s hard to get right. I’d try the similarity approach
below first. 

> But it is even easier than clustering if you know a couple items the user has preferred
just get the most similar to those directly from the indicator matrix. The indicator matrix
is organized by an item per row and each row has similar items by strength of similarity.
Add all the rows the user has interacted with (using the strength values), sort, and recommend
the top n. The in-memory item-based recommender will give you the similar items for each item
the user preferred, all you need to do is add an sort.
> 
> I did try out the item-based recommender, but maybe it took a lot of computational time
because I tried out the Boolean indicator matrix based recommender. ( GenericBooleanPrefItemBasedRecommender
)

Using the in-memory recommender may be a problem if your data is very large. If you do use
it you could implement a check of how much data the user has and ask the recommender for recs
given a userID or collect up similar items using recommender.mostSimilarItems( (long[] itemIDs,
int howMany ) from the interactions you do have. There are other ways to do this with the
Hadoop version of the Item-based recommender.

  /**
   * @param itemIDs
   *          IDs of item for which to find most similar other items
   * @param howMany
   *          desired number of most similar items to find estimates used to determine most
similar items
   * @return items most similar to the given items, ordered from most similar to least
   * @throws TasteException
   *           if an error occurs while accessing the {@link org.apache.mahout.cf.taste.model.DataModel}
   */
  List<RecommendedItem> mostSimilarItems(long[] itemIDs, int howMany) throws TasteException;
 
> 
> You are certainly welcome here but questions like this usually go to the user@mahout.apache.org
list.
> ᐧ
> 
> Thanks for pointing it out! I'll be careful from now on. 
> 
> On Wed, Jun 11, 2014 at 4:37 AM, Pat Ferrel <pat@occamsmachete.com> wrote:
> There are simple ways to do this without maintaining a separate recommender.
> 
> First you can simply cluster the input matrix of users by items. Then recommend items
closest to the centroid of the cluster the user’s couple of items were in. But this seems
dubious for several reasons.
> 
> Better yet (maybe controversial since I don’t know the mathematical justification for
this) but you could cluster the indicator matrix of items by similar items. This is at least
clustering “important” similar items.
> 
> But it is even easier than clustering if you know a couple items the user has preferred
just get the most similar to those directly from the indicator matrix. The indicator matrix
is organized by an item per row and each row has similar items by strength of similarity.
Add all the rows the user has interacted with (using the strength values), sort, and recommend
the top n. The in-memory item-based recommender will give you the similar items for each item
the user preferred, all you need to do is add an sort.
> 
> To truly solve the cold start problem you have items and/or users with no interactions.
This calls for a metadata recommender and some context. If a user is on a page of a product
with no interactions, the metadata must tell which items are similar. In the case where you
have a user with no interactions and no context, you have to rely on things like the time-worn
popular and trending items.
> 
> You are certainly welcome here but questions like this usually go to the user@mahout.apache.org
list.
> 
> On Jun 10, 2014, at 4:50 AM, Sahil Sharma <ssahil08@gmail.com> wrote:
> 
> Hi,
> 
> One place where tree based recommenders(that is using hierarchical
> clustering) might be useful is a cold start problem.  That is suppose a
> user has only bought a few items ( say 2 or 3)  It's kind of hard to
> capture that user's interests using a user-based collaborative filtering
> recommender.
> Also the use of item-based collaborative filtering recommender turns out to
> be time consuming.
> In such a setting it makes sense to cluster the items together ( using some
> clustering algorithm)  and then use the user's purchased item to
> recommend(based on which cluster those purchased items belong to).
> On Jun 10, 2014 4:41 PM, "Sebastian Schelter" <ssc@apache.org> wrote:
> 
> > Hi Sahil,
> >
> > don't worry, you're not breaking any rules. We removed the tree-based
> > recommenders because we have never heard of anyone using them over the
> > years.
> >
> > --sebastian
> >
> > On 06/10/2014 09:01 AM, Sahil Sharma wrote:
> >
> >> Hi,
> >>
> >> Firstly I apologize if I'm breaking certain rules by mailing this way, I'm
> >> new to this and would appreciate any help I could get.
> >>
> >> I was just playing around with the tree-based Recommender ( which seems to
> >> be deprecated in the current version "for the lack of use" ) .
> >>
> >> Why was it deprecated?
> >>
> >> Also, I just looked at the code, and it seems to be doing a lot of
> >> redundant computations, for example we could store a matrix of
> >> cluster-cluster distances ( and hence avoid recomputing the closest
> >> clusters every time by updating the matrix whenever we merge two clusters)
> >> and also , when trying to determine the farthest distance based similarity
> >> between two clusters again the pair which realizes this could be stored ,
> >> and updated upon merging so that this computation need not to repeated
> >> again and again.
> >>
> >> Just wondering if this repeated computation was not a reason for
> >> deprecating the class ( since people might have found a slow recommender
> >> "lacking use" ) .
> >>
> >> Would be glad to hear the thoughts of others on this, and also implement
> >> an
> >> efficient version if the community agrees.
> >>
> >>
> >
> 
> 
> 
> 
> -- 
> Best,
> Sahil
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message