mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Using SSVD for dimensionality reduction on Mahout
Date Wed, 19 Mar 2014 17:29:35 GMT
PS. dspca method, which is almost exact replica of SSVD --pca true,  is
also available on Spark running on exactly same sequence file DRM (there's
no CLI though, it needs to be wrapped in a scala code) [1]. It potentially
may be a bit better performant than MR version, although it is new. If you
are in Scala world and looking for an embedded api, this may be a better
option for you to try. Although it is a new code, and we haven't collected
data on its application yet. it would be awesome if you could try it.

[1] http://mahout.apache.org/users/sparkbindings/home.html


On Wed, Mar 19, 2014 at 10:17 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:

> I am not sure if we have direct CSV converters to do that; CSV is not that
> expressive anyway. But it is not difficult to write up such converter on
> your own, i suppose.
>
> The steps you need to do is this :
>
> (1) prepare set of data points in a form of (unique vector key, n-vector)
> tuples. Vector key can be anything that can be adapted into a
> WritableComparable. Notably, Long or String. Vector key also has to be
> unique to make sense for you.
> (2) save the above tuples into a set of sequence files so that sequence
> file key is unique vector key, and sequence file value is
> o.a.m.math.VectorWritable.
> (3) decide how many dimensions there will be in reduced space. The key is
> reduced, i.e. you don't need too many. Say 50.
> (4) run mahout ssvd --pca true --us true --v false -k <k> .... . The
> reduced dimensionality output will be in the folder USigma. The output will
> have same keys bounds to vectors in reduced space of k dimensions.
>
>
> On Wed, Mar 19, 2014 at 9:45 AM, Vijay B <b.vijay.p14@gmail.com> wrote:
>
>> Hi All,
>> I have a CSV file on which I've to perform dimensionality reduction. I'm
>> new to Mahout, on doing some search I understood that SSVD can be used for
>> performing dimensionality reduction. I'm not sure of the steps that have
>> to
>> be executed before  SSVD, please help me.
>>
>> Thanks,
>> Vijay
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message