From user-return-5329-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Fri Nov 19 08:17:54 2010 Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 9072 invoked from network); 19 Nov 2010 08:17:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Nov 2010 08:17:54 -0000 Received: (qmail 68830 invoked by uid 500); 19 Nov 2010 08:18:24 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 68619 invoked by uid 500); 19 Nov 2010 08:18:24 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 68031 invoked by uid 99); 19 Nov 2010 08:18:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Nov 2010 08:18:23 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pmjimenez1983@hotmail.com designates 65.54.61.80 as permitted sender) Received: from [65.54.61.80] (HELO snt0-omc1-s43.snt0.hotmail.com) (65.54.61.80) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Nov 2010 08:18:13 +0000 Received: from SNT112-W1 ([65.55.90.7]) by snt0-omc1-s43.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 19 Nov 2010 00:17:51 -0800 Message-ID: Content-Type: multipart/alternative; boundary="_d8d6a16d-f8b1-44ec-ad34-a35068e574e7_" X-Originating-IP: [62.14.246.202] From: PEDRO MANUEL JIMENEZ RODRIGUEZ To: Subject: Lanczos Algorithm Date: Fri, 19 Nov 2010 09:17:51 +0100 Importance: Normal In-Reply-To: References: MIME-Version: 1.0 X-OriginalArrivalTime: 19 Nov 2010 08:17:51.0440 (UTC) FILETIME=[42358500:01CB87C2] X-Virus-Checked: Checked by ClamAV on apache.org --_d8d6a16d-f8b1-44ec-ad34-a35068e574e7_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Dear Mahout developers=2C I'm a Computer Science student from the National University of Distance Edu= cation in Spain. I'm currently developing my final year project which is ab= out Diffusion Maps. This method is used for dimensionality reduction and it uses the Lanczos al= gorithm during its operations. The method is already implemented in the las= t release version=20 of Mahout in the LanczosSolver class but we foresee the need to use the alg= orithm with distributed calculations. This implementation of Diffusion Maps= has to deal=20 with extremely large matrices and the distributed calculation is critical f= or me. I have noticed that there is a DistributedLanzcosSolver class implemented i= n the Mahout library but I can=92t have access to the source code because i= t isn't in the=20 last release version of Mahout. Could you please let me know if I could have access to the source code of t= his class?Also I would like to ask you about how the LanczosSolver implemen= tation works. I have made some test between this class and other=20 program which has been implemented in R. This program is using a library ca= lled Arpack=2C which also uses the Lanczos algorithm. When I calculate the = eigenvalues and the eigenvectors of a symmetric matrix. I haven=92t the sam= e results. For example: For this matrix: 4.42282138 1.51744077 0.07690571 0.93650042 2.19609401=20 1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608=20 0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693=20 0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194=20 2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311=20 Results for R: Eigenvalues -0.6442398 1.1084103 2.3946915 6.2018925 Eigenvectors [=2C1] [=2C2] [=2C3] [=2C4] [1=2C] -0.17050824 0.46631043 -0.010360993 0.83660453 [2=2C] -0.06455473 -0.87762807 -0.008814402 0.40939079 [3=2C] 0.68602882 0.04706265 -0.666429293 0.02602181 [4=2C] -0.39567054 -0.07491643 -0.670834157 0.12161492 [5=2C] 0.58272541 -0.06705358 0.325066897 0.34208875 Results for Java: Eigenvalues 0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908 0.1308700= 2850143611=20 I never get the same eigenvalues=2C I think this is because the documentati= on of the class says: To avoid floating point overflow problems which arise in power-methods like= Lanczos=2C an initial pass is made through the input matrix to generate a = good=20 starting seed vector by summing all the rows of the input matrix=2C and com= pute the trace(inputMatrixt*matrix) This latter value=2C being the sum of all of the singular values=2C is used= to rescale the entire matrix=2C effectively forcing the largest singular v= alue to be strictly=20 less than one=2C and transforming floating point overflow problems into flo= ating point underflow (ie=2C very small singular values will become invisib= le=2C as they will=20 appear to be zero and the algorithm will terminate).=20 Is it possible to return the eigenvalues to theirs original value? Eigenvectors -0.83660453 0.23122937 0.010360993 0.46631043 -0.17050824 -0.40939079 0.24067227 0.008814402 -0.87762807 -0.06455473=20 -0.02602181 0.28695718 0.666429293 0.04706265 0.68602882=20 -0.12161492 -0.61075665 0.670834157 -0.07491643 -0.39567054=20 -0.34208875 -0.65821099 -0.325066897 -0.06705358 0.58272541=20 Always happens the same. I have to force the calculation of N vectors (with= an N x N matrix) to obtain the same values for the eigenvectors=2C=20 except in the sign of some of the values=2C which is acceptable. I thought = this implementation of the algorithm should return the eigenvectors sorted = but all the time I=92m obtaining a vector which I don=92t want to calculate= between them.=20 In the example above it=92s the second one starting from the left.Why is th= is happen? Thanks in advance. K.r.Pedro = --_d8d6a16d-f8b1-44ec-ad34-a35068e574e7_--