mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: sparsification of a Mahout vector
Date Sun, 02 Mar 2014 20:05:55 GMT

There isn't a fully baked answer to your needs, but there are components
that can help you.  For instance, the OnlineSummarizer can help you find a
particular quantile.  Iterating over the vector to fill that is easy enough:

For example:

        Vector v;  // original data
        OnlineSummarizer s = new OnlineSummarizer();
        for (Vector.Element e : v.all()) {

        // pick any cutoff you like
        double cutoff = s.quantile(0.99);

Then you can use this cutoff to copy only the items you need:

        Vector r = new RandomAccessSparseVector(v.size());
        for (Vector.Element e : v.all()) {
            double vi = e.get();
            if (vi > cutoff) {
                r.set(e.index(), vi);

Note that if you really want a sparse result, you really have to perform a
selective copy because even if you set elements of a DenseVector to zero,
you still will have the same amount of storage.  Only by copying
selectively to a new vector with the right type can you get the desired

On Sun, Mar 2, 2014 at 10:31 AM, Chirag Lakhani <> wrote:

> Hi,
> I was wondering if there is a simple way to sparsify a vector in Mahout.  I
> basically have an n-dimensional vector (currently a DenseVector) and I want
> to develop a method that sparsifies it by keeping only the largest s values
> of the vector and setting the rest to 0.  Is there a simple solution to
> this given all that is included in the Vector class or do I need to create
> my own method?
> Chirag
> --
> *Chirag Lakhani*
> Data Scientist
> Zaloni, Inc. |
> 633 Davis Dr., Suite 200
> Durham, NC 27713
> e:
> p: 919.602.4965 x7020

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message