mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Mahout performance issues
Date Sun, 04 Dec 2011 13:42:09 GMT
To talk about this clearly, let me go back to my example and add to it:

---
Say we're recommending for user A. User A is connected to items 1, 2, 3.
Those items are connected to other users X, Y, Z. And those users in turn
are connected to items 100, 101, 102, 103.... You can down-sample three
things:

1. The 1,2,3
2. The X,Y,Z
3. The 100,101,102
4. ... the result of downsampling 1-3, again
---

The current implementation samples #2. My proposal samples #2 and #3.
Sebastian's samples #3. Your proposal does #2 and #4. I believe that doing
all 4 is redundant. You probably need to do at least #2 and #3 to avoid the
prolific-user and prolific-item problem.

The reason you are still seeing a fair number of IDs is that #1 is not also
sampled, in my implementation.

I think I suggest that we still have one solution for this, since it's all
small variants on the same theme, and let's make in
SamplingCandidateItemStrategy.

To me, the remaining question is just, which of these 4 do you want to do?
I suggest 2, 3, and maybe 1.
Follow on question: should we make separately settable limits for each, or
does this get complex without much use?

On Sun, Dec 4, 2011 at 1:04 PM, Daniel Zohar <dissoman@gmail.com> wrote:

> I assume the parameter does not affect the possibleItemIDs because of the
> following line:
>
> max = (int)
> Math.max(defaultMaxPrefsPerItemConsidered, userItemCountMultiplier *
> Math.log(Math.max(dataModel.getNumUsers(), dataModel.getNumItems())));
>
> On Sun, Dec 4, 2011 at 2:59 PM, Daniel Zohar <dissoman@gmail.com> wrote:
>
> > Sean, your impl. is indeed better than mine but for some reason when I
> ran
> > it with for a user with a lot of interactions, I got 2023 possibleItemIDs
> > (although I used 10,2 in the constructor).
> >
> > Sebastian, I will try and expriment also with your patch. I would just
> > like to add that in my opinion, as long as 'killing items' has to be done
> > manually, it is not scalable by definition. I personally would always
> > prefer to avoid these kind of solutions. Also, in my case, the most
> popular
> > item has only 3% of the users interacted with, so I suppose that's not
> > exactly the case as well..
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message