mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Similarity between users' groups
Date Fri, 18 Feb 2011 17:19:43 GMT
A better way to sample is to find groups with a very large number of users
and downsample the number of users to a maximum of about 1000 (or even 200
if you want to be more aggressive).  Do the same with users.

That won't delete a whole lot data volume, but it will make most
recommendation algorithms go much faster.  The idea is that after you have
200 or more users in a group, you aren't learning anything new anyway.

On Fri, Feb 18, 2011 at 7:41 AM, Radek Maciaszek

>  Each user can belong to
> many groups so the number of combinations is rather big. In fact this
> number
> of combinations is so large I am considering to sample the users and only
> analyse 1 in about 256 users. So essentially I would have about 1000+
> groups
> and about 150k users. Since one user can potentially belong to many dozens
> of groups this will easily go into millions of records anyway but perhaps
> will be lower than 100M margin you mentioned.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message