mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: [jira] Created: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood
Date Fri, 07 Nov 2008 20:26:49 GMT
Of course, now that I put this in JIRA I'm wondering if treating similarity as the main neighbourhood
membership determiner...
In other words, what I wrote says:
Include all users whose similarity to target user is > minSimilarity.  Then, if the hood
is large, optionally trim the hood to maxHoodSize.

Scary things happen (read: slowness) if you use minSimilarity=0.001 or some other small number.
 This will create a large hood.

So now I'm wondering if one should use maxHoodSize as the primary determiner, so that the
code instead does this:
Include top maxHoodSize users.  Then remove all users whose similarity to target user is <

I tested both approaches and they are equally fast UF you pick good minSimilarity.  But if
you pick an overly low similarity.... ouch - huge hood + slow.  If you pick to high minSimilarity
you risk finding no users that meet that criterium.

The drawback of purely n-nearest approach is that the n-nearest people may really not be very
near.  Consequently, recommendations derived from them will not be the best.  My change tries
to guard against that, but one might argue that getting some not-so-good recommendations is
still better then getting no recommendations (e.g because the given minSimilarity disqualifies
all users and results in 0-sized neighbourhood).  

Thinking our loud...

Sematext -- -- Lucene - Solr - Nutch

----- Original Message ----
> From: Otis Gospodnetic (JIRA) <>
> To:
> Sent: Friday, November 7, 2008 1:04:54 PM
> Subject: [jira] Created: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood
> UserSimilarity-based NearestNNeighborhood
> -----------------------------------------
>                  Key: MAHOUT-95
>                  URL:
>              Project: Mahout
>           Issue Type: Improvement
>           Components: Collaborative Filtering
>             Reporter: Otis Gospodnetic
>             Priority: Minor
>          Attachments:
> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity 
> parameter, which is the primary factor for including/excluding other users from 
> the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to 
> maxHoodSize and is used to optionally limit the size of the neighbourhood.
> The patch is for a brand new class, but we may really want just a single class 
> (either keep this one and axe NearestNUserNeighborhood or add this functionality 
> to NearestNUserNeighborhood), if this sounds good.
> I'll update the unit test and provide a patch for that if others think this can 
> go in.
> Thoughts?
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

View raw message