mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Need help in executing SSVD for dimensionality reduction on Mahout
Date Tue, 18 Mar 2014 05:56:11 GMT
If the rows in the input for SSVD are data points you are trying to create
reduced space for, then rows of USigma represent the same points in the PCA
(reduced) space. The mapping between the input rows and output rows is by
same keys in the sequence files. However, it doesn't look like your input
is using distinct such values (1), this is not recommended.

SSVD will also propagate names if NamedVector is used for rows of the
input. That's possibly another way to map input rows to PCA space rows in
USigma. However, it doesn't look like the input is using Named vectors in
this case.


On Mon, Mar 17, 2014 at 10:22 PM, Vijaya Pratap <bvpratap1@gmail.com> wrote:

> Hi,
>
> I am trying to use SSVD for dimensionality reduction on Mahout, the input
> is a sample data in CSV format. Below is a snippet of the input
>
> 22,2,44,36,5,9,2824,2,4,733,285,169
> 25,1,150,175,3,9,4037,2,18,1822,254,171
>
> I have executed the below steps.
>
> 1. Loaded the csv file and Vectorized the data by following the steps
> mentioned at https://github.com/tdunning/pig-vector with key as
> TextConverter and value as VectorWritable. Listed below is the output of
> this step. I believe the values 420468, 279945 are indices, please correct
> me if I am wrong.
> Key: 1: Value:
>
> {420468:733.0,279945:2.0,607618:285.0,107323:4.0,88330:2.0,263605:9.0,975378:169.0,796003:2824.0,899937:44.0,422862:5.0,723271:22.0,508675:36.0}
> Key: 1: Value:
>
> {420468:1822.0,279945:2.0,607618:254.0,107323:18.0,88330:1.0,263605:9.0,975378:171.0,796003:4037.0,899937:150.0,422862:3.0,723271:25.0,508675:175.0}
>
> 2. Passed the output of the above command to SSVD as follows
> bin/mahout ssvd -i /user/cloudera/vectorized_data/ -o
> /user/cloudera/reduced_dimensions --rank 7 -us true -V false -U false -pca
> true -ow -t 1
>
> Below is a snippet of the output in USigma folder
> Key: 1: Value:
>
> {0:190.78376981262613,1:350.30406212052424,2:78.24932121461198,3:98.67283686605012,4:-122.95056058078157,5:-4.201436498582381,6:-1.4370820809434337}
> Key: 1: Value:
>
> {0:1295.933786837574,1:-698.5629072274602,2:-24.15996813349674,3:60.936737740013946,4:11.859426028893711,5:-6.379057682687426,6:0.9356299409590896}
>
> Please let me know if my approach is correct and help me in interpreting
> the output in USigma folder
>
>
> Thanks in advance
> Pratap
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message