From user-return-18433-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Wed Sep 11 00:49:26 2013 Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B5A810CEC for ; Wed, 11 Sep 2013 00:49:26 +0000 (UTC) Received: (qmail 291 invoked by uid 500); 11 Sep 2013 00:49:24 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 254 invoked by uid 500); 11 Sep 2013 00:49:24 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 229 invoked by uid 99); 11 Sep 2013 00:49:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 00:49:24 +0000 X-ASF-Spam-Status: No, hits=3.0 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 00:49:18 +0000 Received: by mail-pb0-f44.google.com with SMTP id xa7so8317367pbc.31 for ; Tue, 10 Sep 2013 17:48:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=GpdYSHDm5QhFszf/GZBv0L4Fsj3mqCM2zXAey8Jhpfg=; b=GT5t7n8cGfBDhQGi1JH9L+3RWwJYeLjUQDDoe/+T4TuHtNcqc42UkL7swWT8YbvRXX zFOV5X6m/IVPwKA7/PXw9y/i/0e4RsLBxd7xzCf/zpd7bfGzUCPY77PBxSWLBuc4pNih E++Oa1YkX0kLFCwmBsgdjufkTmGHP5PKMBMUZWPXQL8RTHr9bwOVSJSg6r9tvlSq+5Gi n86CuR8N/QXCjbycjbC6fX6bXUpTnvI80xblyxpsMvevdF2uiAC48//TIP7lpSvEuj8R QM3LKysNqh2NuNwAZHRrV7JvZnldVRzoj02xczHv6PIUEs1+HxcksowkdQ2ijf3DWme1 HoRw== X-Received: by 10.68.218.6 with SMTP id pc6mr132685pbc.187.1378860537696; Tue, 10 Sep 2013 17:48:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.100.198 with HTTP; Tue, 10 Sep 2013 17:48:37 -0700 (PDT) From: Yang Date: Tue, 10 Sep 2013 17:48:37 -0700 Message-ID: Subject: SVD, how are the missing values treated? To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=e89a8ffba1ed8b5ab604e610fc2c X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ffba1ed8b5ab604e610fc2c Content-Type: text/plain; charset=ISO-8859-1 in the simple equation describing SVD: A = USV I guess the original matrix A has to have every value filled, so that mathematics will be able to carry out the calculation, right? but the mahout package described here: https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction https://cwiki.apache.org/confluence/display/MAHOUT/SVD+-+Singular+Value+Decomposition allows for input to be sparse, so most elements of A are missing values. so I wonder how mahout takes care of the missing values? this paper: http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA439541 fills missing values with some sort of averages, which sounds rather arbitrary. thanks Yang --e89a8ffba1ed8b5ab604e610fc2c--