mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: sparsification of a Mahout vector
Date Sun, 02 Mar 2014 20:05:55 GMT
Chirag,

There isn't a fully baked answer to your needs, but there are components
that can help you.  For instance, the OnlineSummarizer can help you find a
particular quantile.  Iterating over the vector to fill that is easy enough:

For example:

        Vector v;  // original data
        OnlineSummarizer s = new OnlineSummarizer();
        for (Vector.Element e : v.all()) {
            s.add(e.get());
        }

        // pick any cutoff you like
        double cutoff = s.quantile(0.99);

Then you can use this cutoff to copy only the items you need:

        Vector r = new RandomAccessSparseVector(v.size());
        for (Vector.Element e : v.all()) {
            double vi = e.get();
            if (vi > cutoff) {
                r.set(e.index(), vi);
            }
        }

Note that if you really want a sparse result, you really have to perform a
selective copy because even if you set elements of a DenseVector to zero,
you still will have the same amount of storage.  Only by copying
selectively to a new vector with the right type can you get the desired
effect.





On Sun, Mar 2, 2014 at 10:31 AM, Chirag Lakhani <clakhani@zaloni.com> wrote:

> Hi,
>
> I was wondering if there is a simple way to sparsify a vector in Mahout.  I
> basically have an n-dimensional vector (currently a DenseVector) and I want
> to develop a method that sparsifies it by keeping only the largest s values
> of the vector and setting the rest to 0.  Is there a simple solution to
> this given all that is included in the Vector class or do I need to create
> my own method?
>
> Chirag
>
> --
>
> *Chirag Lakhani*
>
> Data Scientist
>
> Zaloni, Inc. | www.zaloni.com
>
> 633 Davis Dr., Suite 200
>
> Durham, NC 27713
> e: clakhani@zaloni.com
> p: 919.602.4965 x7020
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message