mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: similarity metrics?
Date Wed, 13 Jul 2011 22:12:36 GMT
Yes that's it, according to the reference -- or rather, I suppose, construe
the vector as encoding a discrete distribution. Element i has probability
proportion to the value at i. (It can't have negative values of course.)

It would seem to be what you write below, but the square root of the sum of
squares of those differences. (Not that I've ever encountered it before.) Is
L0.5 the Minkowski 0.5 distance? this looks more like the old Euclidean
distance in that sense.

I wonder out loud what the effect of the difference in square roots versus
the vector element values themselves does, intuitively... since it seems so
close to Euclidean distance.

Yes in any event easy to implement this.


On Wed, Jul 13, 2011 at 10:53 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> You would have to encode the distributions as vectors.
>
> For discrete distributions, I think that this is relatively trivial since
> you could interpret each vector entry as the probability for an element i
> of
> the domain of the distribution.  I think that would result in the Hellinger
> distance [1] being defined as:
>
>  HD(P, Q) = \sum_i (\sqrt(p_i) - \sqrt(q_i) )
>
> This makes it look a lot like L_0.5 which we already have.  Perhaps the
> original poster can clarify if this is what they want?
>
> [1] http://en.wikipedia.org/wiki/Hellinger_distance
>
>
>
> On Wed, Jul 13, 2011 at 2:14 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > How do you apply this metric to vectors?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message