mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Mahout performance issues
Date Thu, 01 Dec 2011 22:51:24 GMT
(Agree, and the sampling happens at the user level now -- so if you sample
one of these users, it slows down a lot. The spirit of the proposed change
is to make sampling more fine-grained, at the individual item level. That
seems to certainly fix this.)

On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <> wrote:

> This may or may not help much.  My guess is that the improvement will be
> very modest.
> The most serious problem is going to be recommendations for anybody who has
> rated one of these excessively popular items.  That item will bring in a
> huge number of other users and thus a huge number of items to consider.  If
> you down-sample ratings of the prolific users and kill super-common items,
> I think you will see much more improvement than simply eliminating the
> singleton users.
> The basic issue is that cooccurrence based algorithms have run-time
> proportional to O(n_max^2) where n_max is the maximum number of items per
> user.
> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <> wrote:
> > This is why I'm looking now into improving GenericBooleanPrefDataModel to
> > not take into account users which made one interaction under the
> > 'preferenceForItems' Map. What do you think about this approach?
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message