mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-195) doubt about SlopeOneRecommender
Date Thu, 05 Nov 2009 15:56:32 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773955#action_12773955
] 

Sean Owen commented on MAHOUT-195:
----------------------------------

You're right about how diffs are computed. Since the Y,Z diff is just the negative of Z,Y,
only one is stored internally, and it's always stored in smaller,bigger order. However, this
should be transparent to you. You can ask for the Y,Z or Z,Y diff and get the right answer.

Yes, prefs and IDs must be ordered in some cases in the API. I think the only one that matters
is the result of DataModel.getPreferencesFromUser(). It is ordered but yeah I am not sure
the docs say the return value is ordered. I can clarify that. So that much should work.

I agree with your other optimization. Actually, not sure why it was written that way to begin
with, unless I forget something. It may have made more sense way way back when item IDs were
not necessarily comparable.

> doubt about SlopeOneRecommender
> -------------------------------
>
>                 Key: MAHOUT-195
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-195
>             Project: Mahout
>          Issue Type: Question
>          Components: Collaborative Filtering
>            Reporter: Jens Grivolla
>            Priority: Minor
>
> Looking through the SlopeOne code in order to make some changes, I am having some doubts
about how MemoryDiffStorage handles things.
> It looks to me like buildAverageDiffs(), or rather processOneUser() inserts the item
pairs in the order they appear in userPreferences, as obtained from dataModel.getPreferencesFromUser(userID).
> So if user A has items (X,Y,Z) we obtain the pairs (X,Y),(X,Z),(Y,Z) and update their
averages,
> if user B has items (Z,X,Y) we obtain (Z,X),(Z,Y),(X,Y).
> When using getDiff for (Y,Z) it will not look for the (Z,Y) average that user B contributes
to, as the average for (Y,Z) is not null.
> Unless we know that preferences are always ordered, e.g. by itemID, this seems like a
bug.  I have not found any mention of it being ordered in the documentation of DataModel or
PreferenceArray.  If the items are ordered it would seem to be easier to check the order in
getDiff(x,y) instead of trying one, then the other.
> P.s.: I tried to ask on mahout-users, but my message never appeared on the list. There
might be some kind of filter rejecting the plus sign in my address or something like that,
but it's the one where I receive the list messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message