Ted, does this apply to recommenders?
Let me describe my problem more simply: Imagine you have a set of N feature
vectors, and you're given a vector X (not in the set of N), and you're asked
to find a vector in N which is nearest to X. I believe this is a classic
description of NN.
I've been making my way through Mahout in Action (I just realized you guys
are the authors; great book!) and some online tutorials, and it seems to me
that I'd have to do quite a lot of shoehorning to achieve my goal. I don't
really have a notion of a user or a rating, similarity would have to be
defined by me, and any further optimization using Kdtrees or LSH could be
difficult.
On Mon, Oct 10, 2011 at 9:25 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> You need to encode these as numerical vectors.
>
> The classes in org.apache.mahout.vectorizer.encoders can help converting
> combined numerical, categorical and textual fields into a coherent vector
> that can be used with standard distance measures.
>
> On Mon, Oct 10, 2011 at 11:54 PM, Felix Filozov <ffilozov@gmail.com>
> wrote:
>
> > I have a set of feature vectors. They're composed of integers and other
> > nonnumerical values. This means that I would need the ability to supply
> my
> > own distance function. My data has no notion of users, just vectors.
> >
> > Example:
> >
> > vector 1: (1, apple, dog, 34, 8766)
> > ...
> > vector n: (3, orange, cat, 3738, 3737)
> >
> > I would like to know if Mahout can perform kNN similarity search using
> such
> > arbitrary items/vectors. As a side question, can it perform that outside
> > the context of a recommender? I think reducing some problems to a
> > recommendation may a bit awkward.
> >
> >
> >
> > On Monday, October 10, 2011, Sean Owen <srowen@gmail.com> wrote:
> > > I think there are a lot of answers to this, depending on what exactly
> > > you want. This is just one answer  maybe you can clarify your
> > > requirements.
> > >
> > > You want to just find the k most similar items, and you want to
> > > construe this as a recommender problem?
> > > The itembased recommenders have a mostSimilarItems() method. All it
> > > does is find the k most similar items to the given item. It's just
> > > applying a given similarity metric to search all possibilities. It
> > > works on "items" but you can flip it around to work on users if you
> > > like.
> > >
> > > Vectors really have to take on numeric values, or else they're not
> > > really vectors! Are you trying to map discrete values to some numeric
> > > range?
> > >
> > >
> > > On Mon, Oct 10, 2011 at 8:26 PM, Felix Filozov <ffilozov@gmail.com>
> > wrote:
> > >> I would like perform a kNN similarity search, where each data point is
> a
> > N
> > >> dimensional vector and each coordinate in the vector may take on any
> > value
> > >> (reals or strings). It seems to me that Mahout doesn't have the
> ability
> > to
> > >> perform a generic kNN similarity search, instead the problem has to be
> > >> mapped to a recommender. Is Mahout the right tool for this task?
> > >>
> > >> If it is, how have you dealt with the mapping, and if not, what would
> > you
> > >> recommend?
> > >>
> > >> Thanks.
> > >>
> > >> Felix
> > >>
> > >
> >
>
