mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Patterson <>
Subject Re: Generic approach to kNN
Date Thu, 13 Oct 2011 03:26:01 GMT
Without knowing a lot about what you are doing, I'd say you could just
do this rather simply as Sean has said with a basic similarity

The really simple "batch" version of this might be:

1. Define similarity function
2. Input of some sort of "base point / instance" which we'll use to
search against
3. the map side of the MR job just takes each input vector and scores
it with the distance function
4. output using the total order partitioner, sorting on distance score
5. look at the first k entries on the front end of the thing

A more complicated option might be something along the lines of "MD-tree":

where they are storing a k-d tree in HBase to give relatively low
latency kNN search queries.

The batch version seems like it might be a nice place to start.

Hope this helps,


On Mon, Oct 10, 2011 at 3:26 PM, Felix Filozov <> wrote:
> I would like perform a kNN similarity search, where each data point is a N
> dimensional vector and each coordinate in the vector may take on any value
> (reals or strings). It seems to me that Mahout doesn't have the ability to
> perform a generic kNN similarity search, instead the problem has to be
> mapped to a recommender. Is Mahout the right tool for this task?
> If it is, how have you dealt with the mapping, and if not, what would you
> recommend?
> Thanks.
> Felix

Twitter: @jpatanooga
Solution Architect @ Cloudera

View raw message