Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 85512 invoked from network); 5 Jul 2010 23:09:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Jul 2010 23:09:40 -0000 Received: (qmail 77642 invoked by uid 500); 5 Jul 2010 23:09:39 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 77552 invoked by uid 500); 5 Jul 2010 23:09:39 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 77544 invoked by uid 99); 5 Jul 2010 23:09:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jul 2010 23:09:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.42 as permitted sender) Received: from [209.85.216.42] (HELO mail-qw0-f42.google.com) (209.85.216.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jul 2010 23:09:31 +0000 Received: by qwb8 with SMTP id 8so2598419qwb.1 for ; Mon, 05 Jul 2010 16:09:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=F6SivXCdAJrOOJcNZQfi0FSX+PIEv75Pg5WJvwF2GFY=; b=hUZ4/a/SudY24TBP07hXftXgDvlrmw+5yYvxYa/GNEbs+87yJ3qSI1JNy4OGx1tv8+ xb/srJAMvw08pxqxy6vExJKMwt4TIGgIFANAXvfE5lFFiZNy1H0dRkjw2C2aJpxmGTOj DnwbVwoFaX1AlplOio/CezvGb6OA2s7RXnXpk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=hVumg3Jf/6wj15UpxxCYFPqqq84VusovPZhx2QM2D/k9kDvbUZcMDRaYuL0g5Dynx/ XV6wS+D4rNJB61TDGt33tIZ1+9TR/EIL/76DJzZbwgrixfarmdmcfdtB4IY2v/SU4Jo1 ZgYc6wepegOnDl70WQU+dibC5D2HOdOooTTE0= Received: by 10.224.85.148 with SMTP id o20mr1870950qal.210.1278371350143; Mon, 05 Jul 2010 16:09:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.3.7 with HTTP; Mon, 5 Jul 2010 16:08:50 -0700 (PDT) In-Reply-To: <7170631B-1F65-4163-8691-0E385CA37634@apache.org> References: <5F706DD5-7052-4B52-BD45-BC9EF68B6C17@apache.org> <7170631B-1F65-4163-8691-0E385CA37634@apache.org> From: Ted Dunning Date: Mon, 5 Jul 2010 16:08:50 -0700 Message-ID: Subject: Re: SVD and Clustering To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=00c09f8fe75f3763df048aac0682 X-Virus-Checked: Checked by ClamAV on apache.org --00c09f8fe75f3763df048aac0682 Content-Type: text/plain; charset=UTF-8 On Mon, Jul 5, 2010 at 12:34 PM, Grant Ingersoll wrote: > > On Jul 5, 2010, at 1:17 PM, Ted Dunning wrote: > > > Yes to this. > > > > On Mon, Jul 5, 2010 at 6:43 AM, Grant Ingersoll > wrote: > > > >> is it just seen as a general way of doing feature reduction and > therefore > >> it makes sense to do. > > Should I normalize my vectors before doing SVD or after or not at all? Yes. :-) Any of these can help. Normalizing before will probably not have a huge effect, but could be helpful if you have certain kinds of odd documents. Normalizing document vectors after SVD may be critical to avoid problems with eigenspokes. Avoiding normalization is important in certain other situations. So the answer to your two binary questions expressed as four possible options is "Yes". Try it and apply the laugh test to each option. --00c09f8fe75f3763df048aac0682--