mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davide Romito <daviderom...@gmail.com>
Subject Proposal about improving Evaluation in Recommender System
Date Wed, 01 Feb 2012 18:18:26 GMT
Hi all,

this is the first time I’m writing to this mailing list. I’m Davide Romito,
I hold a MS in Computer Engineering.

In the last weeks I’ve been using Mahout to create a Boolean User-Based
Collaborative Recommender System, but I’m experiencing some problems.

I applied the suggestions from the book “Mahout in Action” to create a
Boolean Collaborative Recommender System and to calculate precision and
recall using the Mahout evaluator  (pages 38 -39).

I used the Movielens dataset to evaluate precision and recall, as in the
book. The standard code of my evaluator is:



       *public* *static* *void* main(String[] args)
*throws*TasteException, IOException {

             RandomUtils.*useTestSeed*();

             *int* at=10;

             DataModel    model = *new* GenericBooleanPrefDataModel(


GenericBooleanPrefDataModel.*toDataMap*(*new*FileDataModel(

                                        *new* File("ua.base"))));



             GenericRecommenderIRStatsEvaluator evaluator =
*new*GenericRecommenderIRStatsEvaluator();

             RecommenderBuilder recommenderBuilder = *new*RecommenderBuilder() {

                    *public* Recommender buildRecommender(DataModel model)

                                  *throws* TasteException {

                           UserSimilarity similarity =
*new*TanimotoCoefficientSimilarity(model);

                           UserNeighborhood neighborhood =
*new*NearestNUserNeighborhood(

                                        model.getNumUsers() , similarity,
model);

                           *return*
*new*GenericBooleanPrefUserBasedRecommender(model,

                                        neighborhood, similarity);

                    }

             };

             DataModelBuilder modelBuilder = *new* DataModelBuilder() {

                    *public* DataModel buildDataModel(

                                  FastByIDMap<PreferenceArray>
trainingData) {

                           *return* *new* GenericBooleanPrefDataModel(

                                        GenericBooleanPrefDataModel.*
toDataMap*(trainingData));

                    }

             };

             IRStatistics stats = *null*;

             *try* {

                    stats =
evaluator.evaluate(recommenderBuilder,modelBuilder,model,*null*,at,


GenericRecommenderIRStatsEvaluator.*CHOOSE_THRESHOLD*,1.0);

             } *catch* (TasteException e) {

                    e.printStackTrace();

             }



             System.*out*.println(at+","+stats.getPrecision()+","
+stats.getRecall());

}



To evaluate the recommender, I changed just the value of "at" from 1 to 10.
The results are:

@1 -    Precision: 0.2258748674443268       Recall: 0.2258748674443268

@2 -    Precision: 0.24231177094379663     Recall: 0.24231177094379663

@3 -    Precision: 0.20749381406857542     Recall: 0.20749381406857542

@4 -    Precision: 0.19406150583244947     Recall: 0.19406150583244947

@5 -    Precision: 0.20084835630965006     Recall: 0.20084835630965006

@6 -    Precision: 0.2101449275362318       Recall: 0.2101449275362318

@7 -    Precision: 0.21330101499772755     Recall: 0.21330101499772755

@8 -    Precision: 0.21487274655355282     Recall: 0.21487274655355282

@9 -    Precision: 0.21809826793920137     Recall: 0.21809826793920137

@10 -Precision: 0.2218451749734886         Recall: 0.2218451749734886





>From classical theory (Dietmar Jannach, Mark Zanker, Alexander Felfernig,
Gerhard Friedrich, “Recommender System – An Introduction") we know that
Precision and Recall are computed as fractions of *hitsu*, the number of
correctly recommended relevant items for user *u*. Moreover, Precision
should monotonically decrease for increasing values of Recall. If try to
plot the above represented values, you will not obtain any function.
Because of this I think that the way Precision and Recall are computed (at
least for the Boolean case) are not correct.



The Precision (P) relates the number of hits to the total number of
recommended items (|*recsetu*|).



*Pu = |hitsu| / |recsetu|*



In contrast, the Recall (R) computes the ratio of hits to the theoretical
maximum number of hits owing to the testing set size (|*testsetu*|).

* *

*Ru = |hitsu| / |testsetu|*





In the previous example, we can see that Precision and Recall are always
the same for a given @. According to theory it could hold only if the
denominator of the two ratios (i.e., *recsetu* and *testsetu*) are the same.



I’d want to propose some changes/improvements to the existing code.

First of all, Mahout uses the value “at” to calculate the
relevantItemIDssize in
the class GenericRecommenderIRStatsEvaluator (line 121).

In the recommendation, all the users who have rated more then 2*@ items
(line 115) are used. Then, the dataset is split in training set and test
set.

The test set has always a size of @.

Usually the 70% of the dataset is used for training and the 30% is used for
testing. I think we should allow user to set which percentage of the
dataset using for training and which for test.



For this, the first change is into the interface
RecommenderIRStatsEvaluatorwhere I’ve inserted the overloaded the
method “evaluate” (note the double
testSetPercentage parameter):



IRStatistics evaluate(RecommenderBuilder recommenderBuilder,

                               DataModelBuilder dataModelBuilder,

                               DataModel dataModel,

                               IDRescorer rescorer,

                               *int* at,

                               *double* relevanceThreshold,

                               *double* evaluationPercentage,
*double*testSetPercentage)

                               *throws* TasteException;



Then, in the class GenericRecommenderIRStatsEvaluator I made the following
changes (please note the MODIFICATIONS comments):





             LongPrimitiveIterator it = dataModel.getUserIDs();

                    *while* (it.hasNext()) {



                           *long* userID = it.nextLong();



                           *if* (random.nextDouble() >=
evaluationPercentage) {

                                  // Skipped

                                  *continue*;

                           }



                           *long* start = System.*currentTimeMillis*();



                           PreferenceArray prefs = dataModel

                                        .getPreferencesFromUser(userID);

                           *int* size = prefs.length();

                           *if* (size < 2 * at) {

                                  // Really not enough *prefs* to
meaningfully evaluate this

                                  // user

                                  *continue*;

                           }



                           //MODIFICATIONS

                           // if-else added in case the testSet size is
declared, if it is

                           // not expressed use the at size

                           *int* testSetSize;

                           *if* (Double.*isNaN*(testSetPercentage)) {

                                  testSetSize = at;

                           } *else* {

                                  testSetSize = (*int*) (prefs.length() *
testSetPercentage);

                           }



                           // MODIFICATIONS – using testSetSize value to
determine the size of the relevantItemsIDs

                           FastIDSet relevantItemIDs =
*new*FastIDSet(testSetSize);



                           // List some most-preferred items that would
count as (most)

                           // "relevant" results

                           *double* theRelevanceThreshold =
Double.*isNaN*(relevanceThreshold)
? *computeThreshold*(prefs)

                                        : relevanceThreshold;



                           prefs.sortByValueReversed();



// MODIFICATIONS – using testSetSize value to determine the size of the
relevantItemsIDs

                           *for* (*int* i = 0; i < testSetSize; i++) {

                                  *if* (prefs.getValue(i) >=
theRelevanceThreshold) {


relevantItemIDs.add(prefs.getItemID(i));

                                  }

                           }



                           *int* numRelevantItems = relevantItemIDs.size();

                           *if* (numRelevantItems <= 0) {

                                  *continue*;

                           }



                           FastByIDMap<PreferenceArray> trainingUsers = *new
* FastByIDMap<PreferenceArray>(

                                        dataModel.getNumUsers());

                           LongPrimitiveIterator it2 =
dataModel.getUserIDs();

                           *while* (it2.hasNext()) {

                                  *processOtherUser*(userID,
relevantItemIDs, trainingUsers,

                                               it2.nextLong(), dataModel);

                           }



                           DataModel trainingModel = dataModelBuilder == *
null* ? *new* GenericDataModel(

                                        trainingUsers) : dataModelBuilder

                                        .buildDataModel(trainingUsers);

                           Recommender recommender = recommenderBuilder

                                        .buildRecommender(trainingModel);



                           *try* {


trainingModel.getPreferencesFromUser(userID);

                           } *catch* (NoSuchUserException nsee) {

                                  *continue*; // Oops we excluded all *prefs
* for the user -- just

                                                      // move on

                           }



                           *int* intersectionSize = 0;

                           List<RecommendedItem> recommendedItems =
recommender.recommend(

                                        userID, at, rescorer);

                           *for* (RecommendedItem recommendedItem :
recommendedItems) {


*if*(relevantItemIDs.contains(recommendedItem.getItemID())) {

                                        intersectionSize++;

                                  }

                           }



                           *int* numRecommendedItems =
recommendedItems.size();



                           // Precision

                           *…*



Using the 30% of the dataset as test set, the values of precision and
recall are:

@1 -    Precision: 0.41887592788971406  - Recall: 0.02203446639760426

@2 -    Precision: 0.36585365853658586  - Recall: 0.038498761823094016

@3 -    Precision: 0.35348179568752225  - Recall: 0.054509155919394

@4 -    Precision: 0.34172852598091197  - Recall: 0.07081668395014626

@5 -    Precision: 0.3346765641569464  -    Recall: 0.08682432326362303

@6 -    Precision: 0.34554678692220986  - Recall: 0.09602074799164206

@7 -    Precision: 0.3492628368073204  -    Recall: 0.10585290948384189

@8 -    Precision: 0.35034119106699724  - Recall: 0.1117366914783968

@9 -    Precision: 0.3519239083441425  -    Recall: 0.11677870191013658

@10 - Precision: 0.3528225806451612  -     Recall: 0.12537365741292747


In this case, precision and recall are different for a given @, and recall
increases as expected. Moreover, this time you can plot the Precision and
Recall.



Another modification is about the maximum allowed value for @.

For example, in the Movielens dataset every user has at least 20 ratings,
so the evaluation will be done on all the users in the dataset (943 users)
up  @10 (i.e. 20/2). However, if we try to evaluate precision and recall
@11, there will be only 887 users,  so precision and recall will be
calculated on a subset of the dataset, and the values could be very
unexpected. In other words, after a monotonic trend of Precision vs. Recall
up to 10, the values start to oscillate due to the different sizes of the
users considered in the dataset.



In order to avoid calculating values of precision and recall for a subset
of different users, I've inserted a new block where the max@ is calculated
and the evaluation isn’t computed for values of @s greater than @max:

// Block for understanding the max allowed value of AT

             *int* atMax = Integer.*MAX_VALUE*;

             LongPrimitiveIterator itMaxItem = dataModel.getUserIDs();

             *while* (itMaxItem.hasNext()) {

                    *long* user = itMaxItem.nextLong();

                    *if* (atMax >
dataModel.getItemIDsFromUser(user).size()) {

                           atMax =
dataModel.getItemIDsFromUser(user).size();

                    }

             }

             //if at is greater than atMax - return precision, recall,
fall-out and nDCG Double.NaN

             *if* (at > atMax / 2) {

                    *return* *new* IRStatisticsImpl();

} *else* {

                    LongPrimitiveIterator it = dataModel.getUserIDs();

…


I’ve also added a new constructor in the class IRStatisticsImpl:

       IRStatisticsImpl() {

             *this*.precision = Double.*NaN*;

             *this*.recall = Double.*NaN*;

             *this*.fallOut = Double.*NaN*;

             *this*.ndcg = Double.*NaN*;

       }

The reason for the:

 if(at > atMax / 2)

is the same as found at line 113 of the GenericRecommenderIRStatsEvaluator

      PreferenceArray prefs = dataModel.getPreferencesFromUser(userID);

      *int* size = prefs.length();

      *if* (size < 2 * at) {

        // Really not enough prefs to meaningfully evaluate this user

        *continue*;

      }



In this case the evaluation is done only for the users who have more than
2*at preferences.

So if the atMax is 20, it means that all the users have at least 20
preferences and so the maximum @ is 10.



I would really appreciate your comments and opinions about the above
mentioned issues and the proposed solutions.



Thanks,

Davide Romito

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message