Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 48174 invoked from network); 24 Feb 2010 21:54:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Feb 2010 21:54:27 -0000 Received: (qmail 26972 invoked by uid 500); 24 Feb 2010 21:54:27 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 26915 invoked by uid 500); 24 Feb 2010 21:54:27 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 26905 invoked by uid 99); 24 Feb 2010 21:54:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 21:54:26 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of steven.buss@gmail.com designates 209.85.218.217 as permitted sender) Received: from [209.85.218.217] (HELO mail-bw0-f217.google.com) (209.85.218.217) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 21:54:18 +0000 Received: by bwz9 with SMTP id 9so3932017bwz.5 for ; Wed, 24 Feb 2010 13:53:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:from:date:message-id :subject:to:content-type; bh=ucTrmaHaqddtSyEmoZcecB7GvvHq23PGggACrZbG9Jw=; b=f6ndqYvxXiSKW9BJn+GhpVTuFqPeWmhyqHL42UGWjbOENbfLWiMGyKh7m2Nw5zgNlG 2+nFzBt2xMyz+x8kDYhf+v1QK1QF0HHmfLVdQBHwClTEKYVB4sDRN1bHytidyKxdXQHb keQxIRMfXLpul1xowFu3tk0I8ftzouf2Dthzo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=n7wQqnBlkIGweuvGfXyZNAWgTGNe86iUzjQG4B0t2AJZ0rb9EPSyB2laqSCdFEvSq4 HnVjm24vb1mQGTqD4j+qy06e0kQoi/HXke0t8+DDRiC2xWX8BqdaAkO9XHZTDYQduBtJ aIPEcWUDs33Oefhd++Z0NIKV9aNIlCYcVYLz4= MIME-Version: 1.0 Received: by 10.204.143.130 with SMTP id v2mr268137bku.7.1267048435134; Wed, 24 Feb 2010 13:53:55 -0800 (PST) From: Steven Buss Date: Wed, 24 Feb 2010 16:53:35 -0500 Message-ID: Subject: Symmetric eigendecomposition for kernel PCA To: mahout-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 I was chatting with Jake Mannix on twitter regarding mahout 180 and if that patch is suitable for sparse symmetric positive-definite matrices, and he suggested we continue the conversation on the mailing list, so: My research partner and I have a dataset that consists of 400,000 users and 1.6 million articles, with about 22 million nonzeros. We are trying to use this data to make recommendations to users. We have tried using the SVD and PLSI, both with unsatisfactory results, and are now attempting kPCA. We have a 400,000 by 400,000 sparse symmetric positive-definite matrix, H, that we need the top couple hundred eigenvectors/values for. Jake has told me that I can use mahout 180 unchanged, but it will be doing redundant work and the output eigenvalues are the squares of the ones we actually want. This sounds like a good approach, but it would be great if mahout had an optimized eigendecomposition for symmetric matrices. Jake suggested I submit a JIRA ticket regarding this, which I plan to do. H is the pairwise distance in feature space (calculated using a kernel function) between each pair of users (or some subset of users). After I mentioned this to Jake, he asked me "why aren't you just doing it all in one go? Kernelize on the rows, and do SVD on that? Why do the M*M^t intermediate step?" Unfortunately, I'm not sure what you're asking, Jake, can you clarify? Steven Buss steven.buss@gmail.com http://www.stevenbuss.com/