mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Compute similarities for an hudge quantity of data
Date Tue, 07 Jul 2009 08:26:15 GMT
There are two ways to approach this.

1. Use an implementation of UserNeighborhood, such as
NearestNUserNeighborhood, to compute a neighborhood for every one of
your users. Just call getUserNeighborhood() for each user ID. This is
going to take a while -- it will ultimately look at the similarity
between every pair of users if you call this for all users. I suggest
wrapping your UserSimilarity in a CachingUserSimilarity to speed
things up, especially if your similarity computation is slow or data
is in a database.

2. Try TreeClusteringRecommender. Call getClusters(). This will give
you disjoint clusters of similar users. You could also think of them,
use them, as "neighborhoods".

In no case do you, the caller, need to compute all pairs of
similarities yourself. Since you ask, I am still not sure I am
answering your question properly. Maybe if you said more about what
you are trying to accomplish, I could provide an even better solution.

On Tue, Jul 7, 2009 at 1:04 AM, charlysf<> wrote:
> Thanks, it's what I thought.
> Now, I would like to store neighborhood for all my users, so, in fact, if
> for one user, i need to compute the similarity between this one and all
> users, I have to compute all pairs ? or there is something better ?
> The method getNeighborhood do that ? compute with all users ?

View raw message