"a "distributed" implementation of this new metric"
What would this do?
On Wed, Mar 30, 2011 at 7:55 AM, Daniel McEnnis <dmcennis@gmail.com> wrote:
> Sebastion,
>
> It will be in the next patch. Thanks for the heads up.
>
> Daniel.
>
> On Wed, Mar 30, 2011 at 1:35 AM, Sebastian Schelter <ssc@apache.org> wrote:
>> Hi Daniel,
>>
>> We would also need a "distributed" implementation of this new metric. Could
>> you do that too?
>>
>> Shouldn't be too hard, just have a look at the other implementations in
>> org.apache.mahout.math.hadoop.similarity.vector.
>>
>> sebastian
>>
>>
>> On 30.03.2011 00:40, Sean Owen wrote:
>>>
>>> Great, the best place for this would be a JIRA issue:
>>> https://issues.apache.org/jira/browse/MAHOUT
>>> I think it needs a bit of style work. For example, it ought to be very
>>> much like TanimotoCoefficientSimilarity. If you copied that and edited
>>> a few key methods, you'd be a lot closer I think.
>>> I guess I find the core computation a little quirky:
>>>
>>> double distance = preferring1+preferring2  2*intersection;
>>> if(distance< 1.0){
>>> distance=1.0distance;
>>> }else{
>>> distance = 1.0 + 1.0 / distance;
>>> }
>>>
>>> distance is an int, so I think it's
>>>
>>> int distance = preferring1+preferring2  2*intersection;
>>> if(distance == 0){
>>> distance=1;
>>> }else{
>>> distance = 1.0 + 1.0 / distance;
>>> }
>>>
>>> The resulting values are a little odd then  it can return values in
>>> [1,0], or 1.
>>>
>>> By default I'd go with something more like "1.0 / (1.0 + distance)" I
>>> suppose, though that's not somehow the one right way to map a distance
>>> to a similarity  though it would be consistent with
>>> EuclideanDistanceSimilarity.
>>>
>>>
>>> I'd actually welcome you to expand this idea and not just make a
>>> "boolean pref" version of this but one that computes an actual
>>> cityblock distance for prefs with ratings too, for completeness.
>>>
>>>
>>> I know this as "Manhattan distance". Is that an Americanism or is that
>>> actually the more common name to anyone?
>>>
>>>
>>>
>>> On Tue, Mar 29, 2011 at 10:16 PM, Daniel McEnnis<dmcennis@gmail.com>
>>> wrote:
>>>>
>>>> Dear,
>>>>
>>>> Here is a patch of a new distance metric for the collaborative
>>>> filtering modules  CityBlockDistance. With the 0  1 binary split on
>>>> preference. KLDistance, AHDistance, and Symmetric KLDistance don't
>>>> make sense.
>>>>
>>>> Daniel McEnnis.
>>
>>
>

Lance Norskog
goksron@gmail.com
