mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Recommend items not rated by any user
Date Wed, 05 Mar 2014 17:16:44 GMT
For SVD based algorithms, you would should use the AllUnknownItems 
Strategy then, thats correct.

In the majority of industry usecases that I have seen, people use 
pre-computed item similarities (Mahout has lots of machinery for doing 
this, btw), so AllSimilarItems totally makes sense there.

--sebastian

On 03/05/2014 06:01 PM, Tevfik Aytekin wrote:
> It can even make things worse in SVD-based algorithms for which
> preference estimation is very fast.
>
> On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin <tevfik.aytekin@gmail.com> wrote:
>> Hi Sebastian,
>> But in order not to select items that is not similar to at least one
>> of the items the user interacted with you have to compute the
>> similarity with all user items (which is the main task for estimating
>> the preference of an item in item-based method). So, it seems to me
>> that AllSimilarItemsStrategy does not bring much advantage over
>> AllUnknownItemsCandidateItemsStrategy.
>>
>> On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter <ssc@apache.org> wrote:
>>>> So both strategies seems to be effectively the same, I don't know what
>>>> the implementers had in mind when designing
>>>> AllSimilarItemsCandidateItemsStrategy.
>>>
>>> It can take a long time to estimate preferences for all items a user doesn't
>>> know. Especially if you have a lot of items. Traditional item-based
>>> recommenders will not recommend any item that is not similar to at least one
>>> of the items the user interacted with, so AllSimilarItemsStrategy already
>>> selects the maximum set of items that could be potentially recommended to
>>> the user.
>>>
>>> --sebastian
>>>
>>>
>>>
>>>
>>> On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
>>>>
>>>> If the similarity between item 5 and two of the items user 1 preferred are
>>>> not
>>>> NaN then it will return 1, that is what I'm saying. If the
>>>> similarities were all NaN then
>>>> it will not return it.
>>>>
>>>> But surely, you might wonder if all similarities between an item and
>>>> user's items are NaN, then
>>>> AllUnknownItemsCandidateItemsStrategy probably will not return it.
>>>>
>>>
>>>> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jjarmos@gmail.com>
wrote:
>>>>>
>>>>> @Tevfik, running this recommender:
>>>>>
>>>>> GenericItemBasedRecommender itemRecommender = new
>>>>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>>>>>
>>>>>
>>>>> With this dataModel:
>>>>> 1,1,1.0
>>>>> 1,2,2.0
>>>>> 1,3,1.0
>>>>> 1,4,2.0
>>>>> 2,1,1.0
>>>>> 2,2,4.0
>>>>>
>>>>>
>>>>> And these similarities
>>>>> 1,2,0.1
>>>>> 1,3,0.2
>>>>> 1,4,0.3
>>>>> 2,3,0.5
>>>>> 3,4,0.5
>>>>> 5,1,0.2
>>>>> 5,2,1.0
>>>>>
>>>>> Returns item 5 for User 1. So item 5 has not been preferred by user 1,
>>>>> and
>>>>> the similarity between item 5 and two of the items user 1 preferred are
>>>>> not
>>>>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item.
>>>>> So,
>>>>> I'm truly sorry to insist on this, but I still really do not get the
>>>>> difference.
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin
>>>>> <tevfik.aytekin@gmail.com>wrote:
>>>>>
>>>>>> Juan,
>>>>>> You got me wrong,
>>>>>>
>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>
>>>>>> returns all items that have not been rated by the user and the
>>>>>> similarity metric returns a non-NaN similarity value with at
>>>>>> least one of the items preferred by the user.
>>>>>>
>>>>>> So, it does not simply return all items that have not been rated
by
>>>>>> the user. For example, if there is an item X which has not been rated
>>>>>> by the user and if the similarity value between X and at least one
of
>>>>>> the items rated (preferred) by the user is not NaN, then X will be
not
>>>>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will
be
>>>>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jjarmos@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Tefik,
>>>>>>>
>>>>>>> Thanks for the response. I think what you says contradicts what
>>>>>>> Sebastian
>>>>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>>>>>
>>>>>> returns
>>>>>>>
>>>>>>> all items that have not been rated by the user, what would
>>>>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin
>>>>>>> <tevfik.aytekin@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry there was a typo in the previous paragraph.
>>>>>>>>
>>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>>
>>>>>>>> returns all items that have not been rated by the user and
the
>>>>>>>> similarity metric returns a non-NaN similarity value with
at
>>>>>>>> least one of the items preferred by the user.
>>>>>>>>
>>>>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>>>>>
>>>>>> tevfik.aytekin@gmail.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Juan,
>>>>>>>>>
>>>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>>>
>>>>>>>>> returns all items that have not been rated by the user
and the
>>>>>>>>> similarity metric returns a non-NaN similarity value
that is with at
>>>>>>>>> least one of the items preferred by the user.
>>>>>>>>>
>>>>>>>>> Tevfik
>>>>>>>>>
>>>>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ssc@apache.org>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>>>>>
>>>>>>>>>>> I am not sure if that should be implemented in
the Abstract base
>>>>>>
>>>>>> class
>>>>>>>>>>>
>>>>>>>>>>> though because for
>>>>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy,
by
>>>>>>>>
>>>>>>>> definition,
>>>>>>>>>>>
>>>>>>>>>>> it returns the item not rated by the user and
rated by somebody
>>>>>>
>>>>>> else.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Good point. So we seem to need special implementations.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Back to my last post, I have been playing around
with
>>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and
although they both
>>>>>>>>>>> do
>>>>>>>>
>>>>>>>> what
>>>>>>>>>>>
>>>>>>>>>>> I
>>>>>>>>>>> wanted (recommend items not previously rated
by any user), I
>>>>>>
>>>>>> honestly
>>>>>>>>>>>
>>>>>>>>>>> can't
>>>>>>>>>>> tell the difference between the two strategies.
In my tests the
>>>>>>
>>>>>> output
>>>>>>>>
>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>>> always the same. If the eventual output of the
recommender will not
>>>>>>>>>>> include
>>>>>>>>>>> items already rated by the user as pointed out
here (
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>>>>>
>>>>>>>> ),
>>>>>>>>>>>
>>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy should
be equivalent to
>>>>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't
it?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> AllSimilarItems returns all items that are similar
to any item that
>>>>>>
>>>>>> the
>>>>>>>>
>>>>>>>> user
>>>>>>>>>>
>>>>>>>>>> already knows. AllUnknownItems simply returns all
items that the
>>>>>>>>>> user
>>>>>>>>
>>>>>>>> has
>>>>>>>>>>
>>>>>>>>>> not interacted with yet.
>>>>>>>>>>
>>>>>>>>>> These are two different things, although they might
overlap in some
>>>>>>>>>> scenarios.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Sebastian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter
<ssc@apache.org
>>>>>>>
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Juan,
>>>>>>>>>>>>
>>>>>>>>>>>> that is a good catch. CandidateItemsStrategy
is the right place to
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> implement this. Maybe we should simply extend
its interface to add
>>>>>>>>>>> a
>>>>>>>>>>> parameter that says whether to keep or remove
the current users
>>>>>>
>>>>>> items?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We could even do this in the abstract base
class then.
>>>>>>>>>>>>
>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In case somebody runs into the same situation,
the key seems to
>>>>>>
>>>>>> be in
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>> CandidateItemStrategy being passed to
the constructor
>>>>>>>>>>>>> of GenericItemBasedRecommender. Looking
into the code, if no
>>>>>>>>>>>>> CandidateItemStrategy is specified in
the
>>>>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy
is
>>>>>>
>>>>>> used
>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>> as the documentation says, the doGetCandidateItems
method:
>>>>>>
>>>>>> "returns
>>>>>>>>
>>>>>>>> all
>>>>>>>>>>>>>
>>>>>>>>>>>>> items that have not been rated by the
user and that were
>>>>>>
>>>>>> preferred by
>>>>>>>>>>>>>
>>>>>>>>>>>>> another user that has preferred at least
one item that the
>>>>>>>>>>>>> current
>>>>>>>>
>>>>>>>> user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> has
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> preferred too".
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, a different CandidateItemStrategy
needs to be passed. For
>>>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> problem,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy
are good candidates. Does
>>>>>>>>
>>>>>>>> anybody
>>>>>>>>>>>>>
>>>>>>>>>>>>> know where to find some documentation
about the different
>>>>>>>>>>>>> CandidateItemStrategy? Based on the name
I would say that:
>>>>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy
returns all similar
>>>>>>>>>>>>> items
>>>>>>>>>>>>> regardless of whether they have been
already rated by someone or
>>>>>>
>>>>>> not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy
returns all similar
>>>>>>>>>>>>> items
>>>>>>>>
>>>>>>>> that
>>>>>>>>>>>>>
>>>>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan
José Ramos <
>>>>>>
>>>>>> jjarmos@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> First thing is thatI know this requirement
would not make sense
>>>>>>
>>>>>> in
>>>>>>>>
>>>>>>>> a CF
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Recommender. In my case, I am trying
to use Mahout to create
>>>>>>>>
>>>>>>>> something
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In particular, I am pre-computing
a similarity matrix between
>>>>>>>>>>>>>> all
>>>>>>>>
>>>>>>>> the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> documents (items) of my catalogue
and using that matrix as the
>>>>>>>>>>>>>> ItemSimilarity for my Item-Based
Recommender.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, when a user rates a document,
how could I make the
>>>>>>
>>>>>> recommender
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> outputs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> similar documents to that ones the
user has already rated even
>>>>>>
>>>>>> if no
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> other
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> user in the system has rated them
yet? Is that even possible in
>>>>>>
>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> first
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> place?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>


Mime
View raw message