mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Abdeen Hamed <ahmed.elma...@gmail.com>
Subject Re: Merging similarities from two different approaches
Date Fri, 23 Mar 2012 20:33:51 GMT
Hello Sean,

Thanks very much for the detailed response! The proximity() is actually a
similarity metric not a distance one. In my earlier implementations, I used
distance tfidf.distance, hence the comment you saw in the code says
distance.

I am working on decomposing the content based implementation from sales
based implementation. So, thank you for that.

As for merging the scores, I need an OR rule, which translates to the
addition. If I used AND that will make the likelihood smaller because the
probabilities will be multiplied. This will restrict the clusters to items
that appears in the intersection of content-based similarity AND sales
correlations. Does this sound right to you?

A very important issue I am having now is about evaluation. How do we
evaluate these clusters resulting from a TreeClusteringRecommender?

I would appreciate any insight.

Thanks so much for this lively discussion!

-Ahmed

On Thu, Mar 22, 2012 at 6:17 PM, Sean Owen <srowen@gmail.com> wrote:

> Yes, but you can't use it as both things at once. I meant that you
> swap them at the broadest level -- at your original input. So all
> "items" are really users and vice versa. At the least you need two
> separate implementations, encapsulating two different notions of
> similarity.
>
> Similarity is item-item or user-user, not item-item. It makes some
> sense to implement item-item similarity based on tags, so the first
> half of the method looks OK (excepting I'd expect you to implement
> itemSimilarity()).
>
> I think the other half makes more sense if you are calling
> getUsersForItem() -- input is item, output are users.
>
> As for the final line -- my original comment stands, though it's right
> for a wrong reason. You are not combining two distances here. You're
> combining a similarity value and a distance (right? proximity is a
> distance function?) and that's definitely not right. They go opposite
> ways: big distance means small similarity.
>
> If you handle two similarities, the simple thing that is in the
> ballpark on theoretically sound is to take their product.
>
>
> On Thu, Mar 22, 2012 at 9:48 PM, Ahmed Abdeen Hamed
> <ahmed.elmasri@gmail.com> wrote:
> > You are correct. In a previous post, I inquired about the use of
> > TreeClusteringRecommender which is based upon a UserSimilarity metrix. My
> > question was whether I can use it for ItemSimialrity, and your answer was
> > yes, just feed the itemID as a userID and vice versa and that's what I am
> > doing in it the method. This is what this code is doing
> >
> > The purpose of this method is to derive a similarity that is based on
> item
> > attributes (name, brand, category) in addition to what the loglikelihood
> > offers, so I am guaranteed to be getting recommendations for items such
> as
> > ("The Matrix", and "The Matrix Reloaded") if they never co-occur in the
> data
> > model. This is why I need to merge to the two scores somehow.
> >
> > Thanks again!
> > Ahmed
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message