mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Item similarity very slow
Date Mon, 22 Jun 2009 20:13:49 GMT
First, do you have indexes and constraints set on the table? the
primary key should be a composite key, of these two IDs, and both
should have an index. Both should be non-null.

Are you wrapping TanimotoCoefficientSimilarity in a CachingSimilarity
wrapper? this will at least help it cache the similarity computations.
It won't help the first run, but will help subsequent runs a lot.

How many items do you have?

Let's start here and we can think of more solutions after we deal with
these questions.

On Mon, Jun 22, 2009 at 4:06 PM, charlysf<> wrote:
> Hello,
> I would like to compute the item similarity for my data.
> I have this table :
> item_id, subject_id
> An item is linked to a subject, which is a Taste, so I would like to have
> the similarity between items, in fact, if they have the same subjects, or
> not...
> I tried to implement an AbstractJDBCDataModel for my database, and as I have
> some boolean relationship between my item and my subject, I compute
> similarities with TanimotoCoefficientSimilarity.
> My recommender is GenericItemBasedRecommender and I use a
> CachingRecommender.
> In fact, do I have a better solution than :
> for each item as item1
>     give me the neighborhood(item1)
> To retrieve the first neighborhood, I need around 20sec !
> This is my log :

View raw message