mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangRamon <ramon_w...@hotmail.com>
Subject RE: Recommend result contains item which user has already given preference, is that correct?
Date Fri, 21 Oct 2011 02:01:12 GMT

Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean
everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683,
248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send
you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much.
49,409769,4
49,98795,4
49,262163,1
49,66009,4
49,414484,2
49,405329,3
49,312611,1
49,336441,4
49,136494,5
49,345206,3
49,479179,1
49,318960,4
49,52683,3
49,270840,3
49,264828,1
49,222390,4
49,456614,5
49,436207,5
49,306308,2
49,391582,5
49,494200,4
49,423328,3
49,112997,3
49,229347,5
49,474928,3
49,349350,1
49,208508,3
49,314397,2
49,14673,2
49,496041,4
49,301875,4
49,234234,1
49,325287,3
49,35756,5
49,365097,4
49,13376,4
49,333634,2
49,283494,5
49,208617,3
49,245390,1
49,221804,2
49,347821,3
49,138954,5
49,164206,5
49,72238,1
49,356632,1
49,452296,3
49,182288,5
49,499031,5
49,150727,4
49,240533,5
49,326081,4
49,220683,2
49,196527,2
49,177165,3
49,411709,5
49,360722,3
49,466310,1
49,160375,2
49,137203,5
49,32634,4
49,62134,5
49,96982,5
49,196951,1
49,304155,5
49,406109,4
49,244276,5
49,189552,1
49,442215,3
49,268806,2
49,364912,2
49,410896,5
49,450602,5
49,151703,1
49,248872,4
49,21684,1
49,41196,1
49,26614,2
49,369075,5
49,321916,1
49,325081,1
49,329877,4
49,344661,4
49,8429,3
49,69279,1
49,143695,1
49,229120,2
49,26298,4
49,54456,1
49,75937,4
49,87042,3
49,345383,5
49,363683,4
49,128047,3
49,234878,5
49,428914,3
49,353107,2
49,266850,4
49,421211,3
49,265739,4
49,303723,1
49,244575,4
49,303625,4
49,350481,5
49,63985,4
49,207327,3
49,397535,1
49,300916,5
49,358094,4
49,314919,5
49,309355,5
49,403169,5
49,90148,4
49,224056,4
49,359181,2
49,341927,5
49,436521,4
49,480682,4
49,315561,3
49,218647,5
49,245276,2
49,93189,1
49,204695,4
49,498350,5
49,155787,3
49,112730,3
49,416756,2
49,411909,4
49,253353,2
49,196663,5
49,40903,3
49,51873,2
49,66925,3
 > Date: Thu, 20 Oct 2011 18:40:38 +0200
> From: ssc@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given preference,
is that correct?
> 
> To put it simplified:
> 
> The vector of recommendations is the sum of the similarity vectors for
> all preferred items. In each similarity vector for a preferred item the
> entry for that particular item is set to NaN.
> 
> That means that in the recommendation vector the entries for all
> preferred items will be NaN.
> 
> It's a neat trick that is unfortunately very hard to see in the code.
> 
> --sebastian
> 
> On 20.10.2011 18:36, WangRamon wrote:
> > 
> > Hi Sebastian
> > "But as the entry for the item itself is set to NaN in its similarityvector and
NaN plus something stays always NaN, the predicted preferencefor an item that was already
preferred is NaN. And the NaN entries aredropped later."
> > Wait a minute here, i can understand NaN plus something stays always NaN, but, how
do you explain "the predicted preference for an item that was already preferred is NaN", where
do you put the code to check an item that was already preferred? The only thing about NaN
in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am
i right?
> > Thanks
> > Ramon
> >> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >> From: ssc@apache.org
> >> To: user@mahout.apache.org
> >> Subject: Re: Recommend result contains item which user has already given preference,
is that correct?
> >>
> >> On 20.10.2011 16:57, WangRamon wrote:
> >>>
> >>> Hi Sebastian and Sean 
> >>> Thanks for your help. 
> >>>
> >>> I re-read the code again (debug seems to be very difficult for me to setup
the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with
the comments: 
> >>>     /* remove self similarity */ 
> >>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>> I think the meanning is to mark the similarity between Item X and Item X
(the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer,
it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for
all items that similar to Item i (it could be Item X or some other item), then return the
top 10 (default) for a user. 
> >>> During this process, i cannot see any code to exclude an item which the
user has already given preference from recommendation. 
> >>
> >> It's a little bit hidden :) For each preferred item, a vector of all its
> >> similarities is added:
> >>
> >>       numerators = numerators == null
> >>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >> simColumn.times(prefValue)
> >>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >> : simColumn.times(prefValue));
> >>
> >> But as the entry for the item itself is set to NaN in its similarity
> >> vector and NaN plus something stays always NaN, the predicted preference
> >> for an item that was already preferred is NaN. And the NaN entries are
> >> dropped later.
> >>
> >> --sebastian
> >>
> >>
> >>> Correct me if i miss something, thank you guys. 
> >>> Cheers Ramon
> >>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>> Subject: Re: Recommend result contains item which user has already given
preference, is that correct?
> >>>> From: srowen@gmail.com
> >>>> To: user@mahout.apache.org
> >>>>
> >>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>> and/or can you debug to see if this is happening, not happening for
> >>>> you in your use case?
> >>>>
> >>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ssc@apache.org>
wrote:
> >>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also
have a
> >>>>> unit test that checks whether a user is only recommended unknown
items
> >>>>> which still works.
> >>>  		 	   		  
> >>
> >  		 	   		  
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message