mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From charlysf <charles.rue...@gmail.com>
Subject Re: Item similarity very slow
Date Mon, 22 Jun 2009 20:31:11 GMT

In my table I have :
21000 rows, and 10 000 distinct article id

Article_id and subject_id are not null
there is a unique index on (article_id, subject_id) because I have an auto
increment primary key on the table.
I have also an index on : subject_id

All index are B TREE.

I use the CachingSimilarity, but in fact, it doesn't work, as I would like
to compute the similarity only for one item.
Is it normal that, each time, a new query is done to retrieve "Retrieving
number of user preferring item in model 25" and "Retrieving number of user
preferring items in model 25" and to compare with all rows ?


srowen wrote:
> 
> First, do you have indexes and constraints set on the table? the
> primary key should be a composite key, of these two IDs, and both
> should have an index. Both should be non-null.
> 
> Are you wrapping TanimotoCoefficientSimilarity in a CachingSimilarity
> wrapper? this will at least help it cache the similarity computations.
> It won't help the first run, but will help subsequent runs a lot.
> 
> How many items do you have?
> 
> Let's start here and we can think of more solutions after we deal with
> these questions.
> 
> On Mon, Jun 22, 2009 at 4:06 PM, charlysf<charles.ruelle@gmail.com> wrote:
>>
>> Hello,
>>
>> I would like to compute the item similarity for my data.
>>
>> I have this table :
>>
>> item_id, subject_id
>>
>> An item is linked to a subject, which is a Taste, so I would like to have
>> the similarity between items, in fact, if they have the same subjects, or
>> not...
>>
>> I tried to implement an AbstractJDBCDataModel for my database, and as I
>> have
>> some boolean relationship between my item and my subject, I compute
>> similarities with TanimotoCoefficientSimilarity.
>>
>> My recommender is GenericItemBasedRecommender and I use a
>> CachingRecommender.
>>
>> In fact, do I have a better solution than :
>>
>> for each item as item1
>>     give me the neighborhood(item1)
>>
>>
>> To retrieve the first neighborhood, I need around 20sec !
>>
>> This is my log :
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Item-similarity-very-slow-tp24154435p24154834.html
Sent from the Mahout User List mailing list archive at Nabble.com.


Mime
View raw message