mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <>
Subject Re: Clustering user profiles
Date Fri, 13 Jan 2012 11:19:47 GMT
On 13 January 2012 12:02, Robert Stewart <> wrote:
> Rather than using Gender as a single dimension, why not make Male and Female as separate
dimensions, with values 0 or 1 if True or False?

>>> d[1] = 15.5 (latitude)
>>> d[2] = 50.5 (longitude)

Raw lat/long can be rather cryptic. The Geonames folk have Web
services (and/or downloadable data) that maps these to more socially
relevant entities.


There's also a lat/long to Wikipedia entry service, see
...which will get you entities know to DBpedia, Freebase etc.,
allowing more national or regional features to be folded in if needed.

Why have the machine learning layers re-learn stuff that can just be
looked up in a free encyclopaedia? Better to enrich than



View raw message