mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Generic approach to kNN
Date Tue, 11 Oct 2011 01:25:27 GMT
You need to encode these as numerical vectors.

The classes in org.apache.mahout.vectorizer.encoders can help converting
combined numerical, categorical and textual fields into a coherent vector
that can be used with standard distance measures.

On Mon, Oct 10, 2011 at 11:54 PM, Felix Filozov <ffilozov@gmail.com> wrote:

> I have a set of feature vectors. They're composed of integers and other
> non-numerical values. This means that I would need the ability to supply my
> own distance function. My data has no notion of users, just vectors.
>
> Example:
>
> vector 1: (1, apple, dog, 34, 8766)
> ...
> vector n: (3, orange, cat, 3738, 3737)
>
> I would like to know if Mahout can perform kNN similarity search using such
> arbitrary items/vectors. As a side question, can it  perform that outside
> the context of a recommender? I think reducing some problems to a
> recommendation may a bit awkward.
>
>
>
> On Monday, October 10, 2011, Sean Owen <srowen@gmail.com> wrote:
> > I think there are a lot of answers to this, depending on what exactly
> > you want. This is just one answer -- maybe you can clarify your
> > requirements.
> >
> > You want to just find the k most similar items, and you want to
> > construe this as a recommender problem?
> > The item-based recommenders have a mostSimilarItems() method. All it
> > does is find the k most similar items to the given item. It's just
> > applying a given similarity metric to search all possibilities. It
> > works on "items" but you can flip it around to work on users if you
> > like.
> >
> > Vectors really have to take on numeric values, or else they're not
> > really vectors! Are you trying to map discrete values to some numeric
> > range?
> >
> >
> > On Mon, Oct 10, 2011 at 8:26 PM, Felix Filozov <ffilozov@gmail.com>
> wrote:
> >> I would like perform a kNN similarity search, where each data point is a
> N
> >> dimensional vector and each coordinate in the vector may take on any
> value
> >> (reals or strings). It seems to me that Mahout doesn't have the ability
> to
> >> perform a generic kNN similarity search, instead the problem has to be
> >> mapped to a recommender. Is Mahout the right tool for this task?
> >>
> >> If it is, how have you dealt with the mapping, and if not, what would
> you
> >> recommend?
> >>
> >> Thanks.
> >>
> >> Felix
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message