mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Taste-GenericItemBasedRecommender
Date Tue, 08 Sep 2009 21:11:55 GMT
The data structure does bear on this, actually.

You and I see these algorithms a bit differently, I know, which may explain
some of our difference in whether this comes from the data structures.

I see essentially all recommendation algorithms as being patterned on matrix
multiplication of the user x item interaction matrix.  The simplest case is
pure item-based recommendation with no fancy stats for sparsification or
frequency weighting.

In this case, the interaction matrix A is turned into a recommendation
matrix using matrix multiplication: A' A.  To implement recommendation given
a user history u, we recommend the items that have the largest values in (A'
A) u.

If A is sparse, then so should be A' A, although to a lesser degree.
Typically, u will also be sparse.  This means that the computation of the
recommendation using (A' A) u should use entirely sparse algorithms.

So which items will be considered for recommendation at recommendation-time?

The answer is that A'A will have non-zero elements wherever a row (user) of
A has two non-zero items.  Wherever one of these items is an item in u, then
the other item will be considered for recommendation.

Does this make sense?

In practice, of course, (A'A) won't be used directly, but instead will be
further sparsified or replaced by some decomposition into an alternate
coordinate representation.

On Tue, Sep 8, 2009 at 1:26 PM, Sean Owen <srowen@gmail.com> wrote:

> Not sure it has a lot to do with data structures? It's not, well,
> obvious that one doesn't need to consider every item for
> recommendation -- that's the canonical way the algorithms work. In
> practice, there's a shortcut.
>
> Technically, it's kind of wrong. There's nothing about recommender
> engines per se that dictate you couldn't recommend items that aren't
> connected by some co-occurrence. But in practice, it's pretty fine to
> take the shortcut.
>
> This is probably something my brain should have gotten to earlier,
> but, not really sure it is suggestive of another problem? unless I
> miss something.
>
> On Tue, Sep 8, 2009 at 9:15 PM, Ted Dunning<ted.dunning@gmail.com> wrote:
> > These issues should have been handled pretty transparently by the sparse
> > data structures.
> >
> > They they were not is a red flag that there should be a bit of thought
> > applied here.
> >
> > On Tue, Sep 8, 2009 at 5:44 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> I wanted the user base to take note of the change Gokhan suggests
> >> below. I committed a variant of it just now which does indeed notably
> >> speed up most algorithms by more intelligently selecting possibilities
> >> to consider. On one test I am playing with it sped things up by 50% --
> >> in another, more like 400%. Depending on your data this could be a big
> >> win.
> >>
> >> Sean
> >>
> >> On Mon, Sep 7, 2009 at 2:03 PM, Gökhan Çapan<gkhncpn@gmail.com> wrote:
> >> > Hi, Sean.
> >> > I think we talked about mostSimilarItems( ) function before, about a
> bug
> >> in
> >> > ItemBasedRecommender.
> >> > I think there is another issue, about performance.
> >> >
> >> > mostSimilarItems function gives the list of most similar items to a
> given
> >> > item.
> >> > In computation of those items, the algorithm looks at all other items
> in
> >> > data model, but if there is no user that doesn't rate 2 items together
> it
> >> is
> >> > needless to look if there is a similarity between active item and that
> >> item.
> >> >
> >> >
> >> >
> >> > That is the original function that returns most similar items list in
> >> > cf.taste.impl.recommender.GenericItemBasedRecommender:
> >> >
> >> >  private List<RecommendedItem> doMostSimilarItems(long itemID,
> >> >                                                    int howMany,
> >> >
> >> TopItems.Estimator<Long>
> >> > estimator) throws TasteException {
> >> >     DataModel model = getDataModel();
> >> >     FastIDSet allItemIDs = new FastIDSet(model.getNumItems());
> >> >     LongPrimitiveIterator it = model.getItemIDs();
> >> >
> >> >
> >> >     while (it.hasNext()) {
> >> >       allItemIDs.add(it.nextLong());
> >> >     }
> >> >     allItemIDs.remove(itemID);
> >> >     return TopItems.getTopItems(howMany, allItemIDs.iterator(), null,
> >> > estimator);
> >> >   }
> >> >
> >> >
> >> >
> >> >
> >> > I updated and use it that way:
> >> >  private List<RecommendedItem> doMostSimilarItems(long itemID,
> >> >                                                    int howMany,
> >> >
> >> TopItems.Estimator<Long>
> >> > estimator) throws TasteException {
> >> >     DataModel model = getDataModel();
> >> >
> >> >       FastIDSet set=new FastIDSet();
> >> >       PreferenceArray arr=model.getPreferencesForItem(itemID);
> >> >       for(int i=0;i<arr.length();i++){
> >> >
> set.addAll(model.getItemIDsFromUser(arr.get(i).getUserID()));
> >> >       }
> >> >       set.remove(itemID);
> >> >       return
> TopItems.getTopItems(howMany,set.iterator(),null,estimator);
> >> >   }
> >> >
> >> >
> >> >
> >> > The only difference between two function is:
> >> > the original one passes all items to getTopItems
> >> > mine passes only the items that have at least one user who've rated
> both
> >> > active item and that item.
> >> >
> >> >
> >> >
> >> > This little change made the algorithm pretty faster
> >> > (For my data set it runs 4 times faster now.)
> >> >
> >> > I wanted to inform you, if you want to try and update the code.
> >> > If for another reason original version of the code is better, please
> make
> >> me
> >> > know.
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message