From user-return-5329-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Fri Nov 19 08:17:54 2010
Return-Path:
Delivered-To: apmail-mahout-user-archive@www.apache.org
Received: (qmail 9072 invoked from network); 19 Nov 2010 08:17:54 -0000
Received: from unknown (HELO mail.apache.org) (140.211.11.3)
by 140.211.11.9 with SMTP; 19 Nov 2010 08:17:54 -0000
Received: (qmail 68830 invoked by uid 500); 19 Nov 2010 08:18:24 -0000
Delivered-To: apmail-mahout-user-archive@mahout.apache.org
Received: (qmail 68619 invoked by uid 500); 19 Nov 2010 08:18:24 -0000
Mailing-List: contact user-help@mahout.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: user@mahout.apache.org
Delivered-To: mailing list user@mahout.apache.org
Received: (qmail 68031 invoked by uid 99); 19 Nov 2010 08:18:23 -0000
Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Nov 2010 08:18:23 +0000
X-ASF-Spam-Status: No, hits=4.4 required=10.0
tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (nike.apache.org: domain of pmjimenez1983@hotmail.com designates 65.54.61.80 as permitted sender)
Received: from [65.54.61.80] (HELO snt0-omc1-s43.snt0.hotmail.com) (65.54.61.80)
by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Nov 2010 08:18:13 +0000
Received: from SNT112-W1 ([65.55.90.7]) by snt0-omc1-s43.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675);
Fri, 19 Nov 2010 00:17:51 -0800
Message-ID:
Content-Type: multipart/alternative;
boundary="_d8d6a16d-f8b1-44ec-ad34-a35068e574e7_"
X-Originating-IP: [62.14.246.202]
From: PEDRO MANUEL JIMENEZ RODRIGUEZ
To:
Subject: Lanczos Algorithm
Date: Fri, 19 Nov 2010 09:17:51 +0100
Importance: Normal
In-Reply-To:
References:
MIME-Version: 1.0
X-OriginalArrivalTime: 19 Nov 2010 08:17:51.0440 (UTC) FILETIME=[42358500:01CB87C2]
X-Virus-Checked: Checked by ClamAV on apache.org
--_d8d6a16d-f8b1-44ec-ad34-a35068e574e7_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Dear Mahout developers=2C
I'm a Computer Science student from the National University of Distance Edu=
cation in Spain. I'm currently developing my final year project which is ab=
out Diffusion Maps.
This method is used for dimensionality reduction and it uses the Lanczos al=
gorithm during its operations. The method is already implemented in the las=
t release version=20
of Mahout in the LanczosSolver class but we foresee the need to use the alg=
orithm with distributed calculations. This implementation of Diffusion Maps=
has to deal=20
with extremely large matrices and the distributed calculation is critical f=
or me.
I have noticed that there is a DistributedLanzcosSolver class implemented i=
n the Mahout library but I can=92t have access to the source code because i=
t isn't in the=20
last release version of Mahout.
Could you please let me know if I could have access to the source code of t=
his class?Also I would like to ask you about how the LanczosSolver implemen=
tation works. I have made some test between this class and other=20
program which has been implemented in R. This program is using a library ca=
lled Arpack=2C which also uses the Lanczos algorithm. When I calculate the =
eigenvalues and the eigenvectors of a symmetric matrix. I haven=92t the sam=
e results. For example:
For this matrix:
4.42282138 1.51744077 0.07690571 0.93650042 2.19609401=20
1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608=20
0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693=20
0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194=20
2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311=20
Results for R:
Eigenvalues
-0.6442398 1.1084103 2.3946915 6.2018925
Eigenvectors [=2C1] [=2C2] [=2C3] [=2C4]
[1=2C] -0.17050824 0.46631043 -0.010360993 0.83660453
[2=2C] -0.06455473 -0.87762807 -0.008814402 0.40939079
[3=2C] 0.68602882 0.04706265 -0.666429293 0.02602181
[4=2C] -0.39567054 -0.07491643 -0.670834157 0.12161492
[5=2C] 0.58272541 -0.06705358 0.325066897 0.34208875
Results for Java:
Eigenvalues
0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908 0.1308700=
2850143611=20
I never get the same eigenvalues=2C I think this is because the documentati=
on of the class says:
To avoid floating point overflow problems which arise in power-methods like=
Lanczos=2C an initial pass is made through the input matrix to generate a =
good=20
starting seed vector by summing all the rows of the input matrix=2C and com=
pute the trace(inputMatrixt*matrix)
This latter value=2C being the sum of all of the singular values=2C is used=
to rescale the entire matrix=2C effectively forcing the largest singular v=
alue to be strictly=20
less than one=2C and transforming floating point overflow problems into flo=
ating point underflow (ie=2C very small singular values will become invisib=
le=2C as they will=20
appear to be zero and the algorithm will terminate).=20
Is it possible to return the eigenvalues to theirs original value?
Eigenvectors
-0.83660453 0.23122937 0.010360993 0.46631043 -0.17050824
-0.40939079 0.24067227 0.008814402 -0.87762807 -0.06455473=20
-0.02602181 0.28695718 0.666429293 0.04706265 0.68602882=20
-0.12161492 -0.61075665 0.670834157 -0.07491643 -0.39567054=20
-0.34208875 -0.65821099 -0.325066897 -0.06705358 0.58272541=20
Always happens the same. I have to force the calculation of N vectors (with=
an N x N matrix) to obtain the same values for the eigenvectors=2C=20
except in the sign of some of the values=2C which is acceptable. I thought =
this implementation of the algorithm should return the eigenvectors sorted =
but all the time I=92m obtaining a vector which I don=92t want to calculate=
between them.=20
In the example above it=92s the second one starting from the left.Why is th=
is happen?
Thanks in advance.
K.r.Pedro =
--_d8d6a16d-f8b1-44ec-ad34-a35068e574e7_--