mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: How to combine boolean datamodel with datamodel
Date Thu, 22 Jul 2010 13:21:10 GMT
It's attached here: *https://issues.apache.org/jira/browse/MAHOUT-445*

If you want to use the testcode you sent yesterday with the patch, you
would need to change the way the recommender is created to:

new GenericItemBasedRecommender(model, itemSimilarity, new
AllUnknownItemsCandidateItemsStrategy())

--sebastian

Am 22.07.2010 15:15, schrieb Young:
> Hi Sebastian,
> Thank you. Where can we download the patch?
>  
> ---Young
>
>
>
>
>
>   
>> Hi all,
>>
>> I did a little refactoring today to be able to inject customized ways of
>> fetching the candidate items. I wrote another implementation that just
>> returns all items not yet rated by the user. This won't be suitable for
>> large datasets but it did quite well for the grouplens dataset (some
>> testing results attached). I'm gonna create a patch so you can have a
>> look at the refactoring and if you decide to commit it, it could be a
>> suitable starting point for implementing Ted's proposed way of candidate
>> item fetching.
>>
>> Another advantage of that patch is that users could supply use-case
>> specific implementations of candidate item fetching without having to
>> subclass the recommender of their choice.
>>
>> --sebastian
>>
>> Tests for random users with different candidate item fetching strategies
>> (grouplens dataset)
>>
>> User 1063
>> found 3605 items in 2376ms (current approach)
>> found 3606 items in 1ms (all unknown items)
>>
>> User 3596
>> found 3575 items in 1889ms (current approach)
>> found 3578 items in 2ms (all unknown items)
>>
>> User 3300
>> found 3343 items in 6603ms (current approach)
>> found 3344 items in 0ms (all unknown items)
>>
>> User 924
>> found 3507 items in 4173ms (current approach)
>> found 3507 items in 4ms (all unknown items)
>>
>> User 4505
>> found 3427 items in 4774ms (current approach)
>> found 3427 items in 1ms (all unknown items)
>>
>> User 3378
>> found 3471 items in 4225ms (current approach)
>> found 3471 items in 0ms (all unknown items)
>>
>> User 246
>> found 3673 items in 730ms (current approach)
>> found 3677 items in 0ms (all unknown items)
>>
>>
>> Am 22.07.2010 02:00, schrieb Ted Dunning:
>>     
>>> This is a ubiquitous problem with coocurrence algorithms since they scale in
>>> the square of the number of occurrences most popular item.
>>>
>>> The good news is that you learn everything there is to learn about that item
>>> if you look at just a sampling of the occurrences so sampling is your
>>> friend.  If there is temporal structure, I tend to bias the sample toward
>>> recent items.
>>>
>>> Regarding the size, I have generally had an arbitrary cutoff attached to a
>>> configuration knob in my production systems.  It is probably reasonable to
>>> set this limit to something like max(100, 20*log(max(N_users, N_items))).
>>>  This isn't really any less arbitrary, but it will probably never need
>>> tweaking in normal use.
>>>   
>>>       
>>     


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message