mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: [jira] Created: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood
Date Mon, 10 Nov 2008 18:47:09 GMT
OK, so that's what I did now and put in JIRA.  I don't like that copy/paste part in TopItems
though :(

 
Otis



________________________________
From: Sean Owen <srowen@gmail.com>
To: mahout-dev@lucene.apache.org; Otis Gospodnetic <otis_gospodnetic@yahoo.com>
Sent: Saturday, November 8, 2008 10:13:08 AM
Subject: Re: [jira] Created: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood

I would think this is really just a small variation on the nearest-n
version, which only ever keeps up to n users in consideration. You
just add an additional filter criteria. So yes I agree, your second
approach is right.

On Fri, Nov 7, 2008 at 8:26 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Of course, now that I put this in JIRA I'm wondering if treating similarity as the main
neighbourhood membership determiner...
> In other words, what I wrote says:
> Include all users whose similarity to target user is > minSimilarity.  Then, if the
hood is large, optionally trim the hood to maxHoodSize.
>
> Scary things happen (read: slowness) if you use minSimilarity=0.001 or some other small
number.  This will create a large hood.
>
> So now I'm wondering if one should use maxHoodSize as the primary determiner, so that
the code instead does this:
> Include top maxHoodSize users.  Then remove all users whose similarity to target user
is < minSimilarity.
>
> I tested both approaches and they are equally fast UF you pick good minSimilarity.  But
if you pick an overly low similarity.... ouch - huge hood + slow.  If you pick to high minSimilarity
you risk finding no users that meet that criterium.
>
> The drawback of purely n-nearest approach is that the n-nearest people may really not
be very near.  Consequently, recommendations derived from them will not be the best.  My change
tries to guard against that, but one might argue that getting some not-so-good recommendations
is still better then getting no recommendations (e.g because the given minSimilarity disqualifies
all users and results in 0-sized neighbourhood).
>
> Thinking our loud...
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Otis Gospodnetic (JIRA) <jira@apache.org>
>> To: mahout-dev@lucene.apache.org
>> Sent: Friday, November 7, 2008 1:04:54 PM
>> Subject: [jira] Created: (MAHOUT-95) UserSimilarity-based NearestNNeighborhood
>>
>> UserSimilarity-based NearestNNeighborhood
>> -----------------------------------------
>>
>>                  Key: MAHOUT-95
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-95
>>              Project: Mahout
>>           Issue Type: Improvement
>>           Components: Collaborative Filtering
>>             Reporter: Otis Gospodnetic
>>             Priority: Minor
>>          Attachments: UserSimilarityNearestNUserNeighborhood.java
>>
>> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity
>> parameter, which is the primary factor for including/excluding other users from
>> the target user's neighbourhood.  Additionally, the 'n' parameter was renamed to
>> maxHoodSize and is used to optionally limit the size of the neighbourhood.
>>
>> The patch is for a brand new class, but we may really want just a single class
>> (either keep this one and axe NearestNUserNeighborhood or add this functionality
>> to NearestNUserNeighborhood), if this sounds good.
>>
>> I'll update the unit test and provide a patch for that if others think this can
>> go in.
>>
>> Thoughts?
>>
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message