mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anatoliy Kats (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set
Date Wed, 14 Dec 2011 15:13:30 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169423#comment-13169423
] 

Anatoliy Kats commented on MAHOUT-906:
--------------------------------------

I have to head out, let me ask you a question before I do.  Based on what I saw so far, it
seems that I need to factor out AbstractDifferenceRecommenderEvaluator::call() to check for
relevant items instead of simply estimating the preference.  If that's indeed the case, how
do you feel about that?
                
> Allow collaborative filtering evaluators to use custom logic in splitting data set
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-906
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-906
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Priority: Minor
>              Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I want to start a discussion about factoring out the logic used in splitting the data
set into training and testing.  Here is how things stand:  There are two independent evaluator
based classes:  AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly
into a training and testing set.  GenericRecommenderIRStatsEvaluator takes one user at a time,
removes their top AT preferences, and counts how many of them the system recommends back.
> I have two use cases that both deal with temporal dynamics.  In one case, there may be
expired items that can be used for building a training model, but not a test model.  In the
other, I may want to simulate the behavior of a real system by building a preference matrix
on days 1-k, and testing on the ratings the user generated on the day k+1.  In this case,
it's not items, but preferences(user, item, rating triplets) which may belong only to the
training set.  Before we discuss appropriate design, are there any other use cases we need
to keep in mind?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message