mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Using SSVD for dimensionality reduction on Mahout
Date Wed, 19 Mar 2014 17:17:17 GMT
I am not sure if we have direct CSV converters to do that; CSV is not that
expressive anyway. But it is not difficult to write up such converter on
your own, i suppose.

The steps you need to do is this :

(1) prepare set of data points in a form of (unique vector key, n-vector)
tuples. Vector key can be anything that can be adapted into a
WritableComparable. Notably, Long or String. Vector key also has to be
unique to make sense for you.
(2) save the above tuples into a set of sequence files so that sequence
file key is unique vector key, and sequence file value is
o.a.m.math.VectorWritable.
(3) decide how many dimensions there will be in reduced space. The key is
reduced, i.e. you don't need too many. Say 50.
(4) run mahout ssvd --pca true --us true --v false -k <k> .... . The
reduced dimensionality output will be in the folder USigma. The output will
have same keys bounds to vectors in reduced space of k dimensions.


On Wed, Mar 19, 2014 at 9:45 AM, Vijay B <b.vijay.p14@gmail.com> wrote:

> Hi All,
> I have a CSV file on which I've to perform dimensionality reduction. I'm
> new to Mahout, on doing some search I understood that SSVD can be used for
> performing dimensionality reduction. I'm not sure of the steps that have to
> be executed before  SSVD, please help me.
>
> Thanks,
> Vijay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message