mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Clustering user profiles
Date Fri, 13 Jan 2012 20:49:28 GMT
I usually prefer to represent location as an xyz triple on a unit sphere.
 That allows Euclidean distance to be useful.

On the 1 of n encoded values. Euclidean works as well.  For gender, it also
works fine.

The only issue is how to combine these with reasonable weightings.  An easy
way to do this is to have a weighted Euclidean distance which looks like

    distance(A, B) = sqrt {\sum_i w_i (a_i - b_i)^2}

Figuring out the weights is a bit tricky, but not horrendously hard.

On Fri, Jan 13, 2012 at 12:02 PM, Sean Owen <> wrote:

> Certainly not the only solution. As I've been saying: what would it
> mean to have n distance measures -- how would you combine them?
> If you can answer that, you can likely just as easily transform the
> input so that the result is meaningful when all dimensions are
> combined by one metric.
> This is the certainly the basic idea, and it isn't necessarily simple.
> But it's straightforward and I think strictly less hard than what you
> are contemplating.
> On Fri, Jan 13, 2012 at 4:57 PM, Raviv Pavel <> wrote:
> > I think the only solution would be do develop a custom distance measure
> > that's aware of the "meaning" of each dimension(s) and return the
> distance
> > accordingly.
> > Unless there is a way to vectorize user profiles in such a way that will
> > allow me to use one of the built in distance measures.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message