mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raviv Pavel <street...@gmail.com>
Subject Re: Clustering user profiles
Date Fri, 13 Jan 2012 09:08:52 GMT
My initial plan was to do exactly that, use 0 & 1 for gender, age as is, lat
& lon in two dimensions, and one dimension holding 0 or 1 per possible
interest (each value is mapped to an offset in the dimension)
For simplicity let's assume I have 3 types of interests, so a vector of a
person would look like this:

d[0] = 1 (gender)
d[1] = 15.5 (latitude)
d[2] = 50.5 (longitude)
d[3] = 41 (age)
d[4] = 0 (not interested in A)
d[5] = 1 (interested in B)
d[6] = 0 (not interested in C)


I'm probably misunderstanding something here, but with this approach no
single built-in distance measure will take into account that dimensions 1 &
2 should be compared as a pair using euclidean distance, and dimensions 4,5
and 6 should be compared by counting the common values between two vectors.

--
View this message in context: http://lucene.472066.n3.nabble.com/Clustering-user-profiles-tp3654678p3656144.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Mime
View raw message