mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel McEnnis <dmcen...@gmail.com>
Subject Re: new distance metric
Date Wed, 30 Mar 2011 14:55:29 GMT
Sebastion,

It will be in the next patch.  Thanks for the heads up.

Daniel.

On Wed, Mar 30, 2011 at 1:35 AM, Sebastian Schelter <ssc@apache.org> wrote:
> Hi Daniel,
>
> We would also need a "distributed" implementation of this new metric. Could
> you do that too?
>
> Shouldn't be too hard, just have a look at the other implementations in
> org.apache.mahout.math.hadoop.similarity.vector.
>
> --sebastian
>
>
> On 30.03.2011 00:40, Sean Owen wrote:
>>
>> Great, the best place for this would be a JIRA issue:
>> https://issues.apache.org/jira/browse/MAHOUT
>> I think it needs a bit of style work. For example, it ought to be very
>> much like TanimotoCoefficientSimilarity. If you copied that and edited
>> a few key methods, you'd be a lot closer I think.
>> I guess I find the core computation a little quirky:
>>
>>             double distance = preferring1+preferring2 - 2*intersection;
>>            if(distance<  1.0){
>>                distance=1.0-distance;
>>            }else{
>>                distance = -1.0 + 1.0 / distance;
>>            }
>>
>> distance is an int, so I think it's
>>
>>             int distance = preferring1+preferring2 - 2*intersection;
>>            if(distance == 0){
>>                distance=1;
>>            }else{
>>                distance = -1.0 + 1.0 / distance;
>>            }
>>
>> The resulting values are a little odd then -- it can return values in
>> [-1,0], or 1.
>>
>> By default I'd go with something more like "1.0 / (1.0 + distance)" I
>> suppose, though that's not somehow the one right way to map a distance
>> to a similarity -- though it would be consistent with
>> EuclideanDistanceSimilarity.
>>
>>
>> I'd actually welcome you to expand this idea and not just make a
>> "boolean pref" version of this but one that computes an actual
>> city-block distance for prefs with ratings too, for completeness.
>>
>>
>> I know this as "Manhattan distance". Is that an Americanism or is that
>> actually the more common name to anyone?
>>
>>
>>
>> On Tue, Mar 29, 2011 at 10:16 PM, Daniel McEnnis<dmcennis@gmail.com>
>>  wrote:
>>>
>>> Dear,
>>>
>>> Here is a patch of a new distance metric for the collaborative
>>> filtering modules - CityBlockDistance.  With the 0 - 1 binary split on
>>> preference. KLDistance, AHDistance, and Symmetric KLDistance don't
>>> make sense.
>>>
>>> Daniel McEnnis.
>
>

Mime
View raw message