mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <dan...@danbri.org>
Subject Re: Clustering user profiles
Date Fri, 13 Jan 2012 11:19:47 GMT
On 13 January 2012 12:02, Robert Stewart <bstewart.ny@gmail.com> wrote:
> Rather than using Gender as a single dimension, why not make Male and Female as separate
dimensions, with values 0 or 1 if True or False?

>>> d[1] = 15.5 (latitude)
>>> d[2] = 50.5 (longitude)

Raw lat/long can be rather cryptic. The Geonames folk have Web
services (and/or downloadable data) that maps these to more socially
relevant entities.

See http://www.geonames.org/export/web-services.html#findNearby
e.g. http://api.geonames.org/extendedFindNearby?lat=47.3&lng=9&username=demo

There's also a lat/long to Wikipedia entry service, see
http://www.geonames.org/export/wikipedia-webservice.html#findNearbyWikipedia
...which will get you entities know to DBpedia, Freebase etc.,
allowing more national or regional features to be folded in if needed.

Why have the machine learning layers re-learn stuff that can just be
looked up in a free encyclopaedia? Better to enrich than
rediscover...?

cheers,

Dan

Mime
View raw message