mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <pat.fer...@gmail.com>
Subject Re: Decaying score for old preferences when using the .refresh()
Date Thu, 07 Nov 2013 20:08:59 GMT
Not sure how you are going to decay in Mahout. Once ingested into Mahout there are no timestamps.
So you’ll have to do that before ingesting.

Last year we set up an ecom-department store type recommender with data from online user purchase,
add-to-cart, and view. The data was actual user behavior in the online store before any recommender
was implemented. So it was very clean of external effects. We took varying time slices and
measured change in precision of Mahout’s item-base recommendations. We found that the precision
always increased with more data up to our max of 1 year. Put another way we took 3 months,
6 months, 9 months, and 12 months produced the best results. All we did was filter items no
longer in stock. We did nothing to decay preferences.

That said you can still make a good case to limit or decay user preferences used in the queries.
The problem is you may not want to have the same limit on data used to build the model. The
model data represents user’s taste similarities, which change very slowly. I don’t know
of a way to have a short time span user preference query against a long time span model in
Mahout, as Gokhan says.

If you care to hack Mahout you can use different data in the recommendation pipeline. Mahout
uses the user preference matrix to calculate item-item similarities and puts them in a DRM
(distributed row matrix) then it uses the user’s preference data taken from the preference
matrix as a sort of query agains the item-item DRM. If you use your own truncated user preference
vectors (or decayed) as the queries instead of the ones that were used to train the item-item
DRM you would do get the result you are trying for without throwing out potentially important
training data.

By decaying the user preferences you may get a lower precision score, but that is only a crude
measure of goodness. The recs for recent user activity will probably result in more sales
since they indicate recent user intent. You can measure this later with A/B testing if you
want.

On Nov 7, 2013, at 12:50 AM, Gokhan Capan <gkhncpn@gmail.com> wrote:

Cassio,

I am not sure if there are direct/indirect ways to to this with existing
code.

Recall that an item neighborhood based score prediction, in simplest terms,
is a weighted average of the active user's ratings on other items, where
the weights are item-to-item similarities. Applying a decay function to
these item-to-item weights, where the the decay is based on the rating time
of the active user on the "other item"s can help to achieve this.

One consideration might be for users who do not change their rating
behavior much, this decay can mask valuable historical information.

This particular approach is discussed, and proven to increase the accuracy
in "Collaborative filtering with Temporal Dynamics" by Yehuda Koren. The
decay function is parameterized per user, keeping track of how consistent
the user behavior is.

If you think it is not necessary to estimate those per user parameters, in
Mahout's current neighborhood based recommenders, you might apply that
decay to item-to-item similarities at "recommendation time". Note that
DataModel#getPreferenceTime is the method you require. If you're using a
GenericItemBasedRecommender directly,
GenericItemBasedRecommender#doEstimatePreference is where your edits would
go. The benefit here is not having to update item-to-item similarities, so
you can still cache them.


Gokhan


On Wed, Nov 6, 2013 at 6:32 PM, Cassio Melo <melo.cassio@gmail.com> wrote:

> Assuming that most recent ratings or implicit preference data is more
> important than the older ones, I wonder if there is a way to decrease the
> importance (score) of old preference entries without having to update all
> previous preferences.
> 
> Currently I'm fetching new preferences from time to time and using the
> .refresh() method to update the data model with the new values.
> 
> Thanks
> 


Mime
View raw message