mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Top items in a vector
Date Mon, 25 Apr 2011 20:03:53 GMT
There's nothing in Mahout to do this (afaik).

There's a statistical method for this doing that in linear time, which
is very easy to implement (and btw would make an excellent addition to
Mahout), google up 'countsketch'.

It is an offline algorithm and although i never considered it, it
might be embarassingly parallelizable for MR.


On Mon, Apr 25, 2011 at 12:32 PM, Julian Limon <> wrote:
> Hello all,
> I'm using SVD to reduce the dimensionality of a text corpus. When I get
> queries, I generate a new matrix with them (based on the dictionary of the
> index) and apply the same matrix transformation. Finally, I
> multiply (SVD'd) the index matrix by the (SVD'd) query matrix to get a
> similarity vector for each query.
> My question is, is there a class (or a command-line instruction) that
> generates the top items from this vector? I know that Taste has a
> abstraction called "TopItems", but I wonder if a similar thing exists for
> vectors.
> Thanks a lot,
> Julian

View raw message