mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: CosineDistanceMeasure for 2 zero vectors?
Date Thu, 04 Apr 2013 20:53:20 GMT
Dislike should not be modeled by a zero rating IMHO. This might also
create problems with the iterateNonZero() method in our vectors.



On 04.04.2013 22:40, Andrew Musselman wrote:
> I think it should return an "undefined" symbol.  There is no angle between
> two zero vectors.
> 
> In a practical sense, taking two zero vectors to be equivalent in the
> context of user-item vectors, say, is dodgy in my opinion.  That is akin to
> saying "If we both hate everything on this restaurant's menu we are the
> same person."
> 
> 
> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <dangeorge.filimon@gmail.com>wrote:
> 
>> Suneel is right. :)
>>
>> Let me explain how this came up:
>> - When clustering, and assigning a point to a cluster, the centroid needs
>> to be updated.
>> - To update the centroid in the nearest neighbor searcher classes, the
>> centroid must first be removed.
>> - To remove the centroid, we get the closest vector (search for it, and it
>> should be itself) and then remove it from the data structures.
>> => However, when the centroid is 0, the nearest vector (which should be
>> itself) has a huge distance (1 rather than 0) and this trips a check.
>>
>>
>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <srowen@gmail.com> wrote:
>>
>>> It sounds pretty undefined, but I would tend to define the distance as
>>> 0 in this case of course. And that means defining the cosine as 1.
>>> Which class in particular? There are a few implementations of this
>>> distance measure.
>>>
>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <dangeorge.filimon@gmail.com
>>>
>>> wrote:
>>>> In the case where both vectors are all zeros, the angle between them is
>>> 0,
>>>> so the cosine is therefore 1 and the so the distance returned should
>> be 0
>>>> (unless I misunderstood what the distance does).
>>>>
>>>> In Mahout, when calling distance() however, if both the denominator and
>>>> dotProduct are 0 (which is true when both vectors are 0), the returned
>>>> value is 1.
>>>>
>>>> This looks like a bug to me and I would open a JIRA issue and fix it
>> but
>>> I
>>>> want to make sure there's nothing I could possibly be missing.
>>>>
>>>> Thoughts?
>>>
>>
> 


Mime
View raw message