Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 8003 invoked from network); 27 Nov 2009 12:01:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Nov 2009 12:01:37 -0000 Received: (qmail 92357 invoked by uid 500); 27 Nov 2009 12:01:36 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 92303 invoked by uid 500); 27 Nov 2009 12:01:35 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 92291 invoked by uid 99); 27 Nov 2009 12:01:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2009 12:01:35 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.220.224 as permitted sender) Received: from [209.85.220.224] (HELO mail-fx0-f224.google.com) (209.85.220.224) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2009 12:01:24 +0000 Received: by fxm24 with SMTP id 24so1274124fxm.11 for ; Fri, 27 Nov 2009 04:01:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=anxUy4dk+UNQgtM0KcHI0Wp1lN4fycc6+NPGHR3t++Q=; b=GML+RG5GGlOCxXeMEWMmaS3DD4Sey2hw0RjyIbXa+QJdmrWPROrV1LnxOwQKeFIjNP ZQQ+lWoY5uSasDmL6Fwlky7V7UnHJy47lL4xKbuzLHmD+DPrNYCiS3kmv4uuXds9Fpba hTUGqejczgLKBlwdzWBU8s4C5XZwmzQ4+N3Co= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Q60p07wKxDKMLybGdoaMe+5OZ2+2FoYn2DZeqaEVVOwWdoWBuEZJy5nXN/6fq50Khn HTOnoJPZQNtggETCKY4p4ocAn8sovyo+OC645ADO6K5fVh7ro+DLMn+iByylL5EYVFVm BXZtK1IIzfWgK6ycvVCYxHuLuMD8B9Uk1DrIs= MIME-Version: 1.0 Received: by 10.239.179.94 with SMTP id c30mr84744hbg.159.1259323264113; Fri, 27 Nov 2009 04:01:04 -0800 (PST) In-Reply-To: <26540395.post@talk.nabble.com> References: <26530825.post@talk.nabble.com> <26533265.post@talk.nabble.com> <26535849.post@talk.nabble.com> <26540395.post@talk.nabble.com> Date: Fri, 27 Nov 2009 12:01:04 +0000 Message-ID: Subject: Re: Mahout/Taste covariance between two items From: Sean Owen To: mahout-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org I'm not so familiar with this formula but you seem to be missing something in the denominator... it's got to normalize somehow. I think I said divide by standard deviation but that's not quite it. What you are really summing are the products of z-scores. See http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient But I think you should just use the formulation given in the code? should be the same result. At least I hope these aren't different definitions of Pearson! On Fri, Nov 27, 2009 at 10:20 AM, jamborta wrote: > > thanks you. much clearer now. > > so for my purpose this will do: > > sumXY/N-1 > > given that the data is 'centered'? > > > On Fri, Nov 27, 2009 at 1:41 AM, jamborta wrote: >> >> hi. I tried to figure out how you calcualte pearson correlation, but it >> looks >> like you use this formula: >> >> sumXY / sqrt(sumX2 * sumY2) > > Yes that's right -- this is what Pearson reduces to when the mean of X > and Y are 0. And they are here -- the implementation 'centers' the > data. > >> where sumXY = sumXY - meanY * sumX; >> sumX2 = sumX2 - meanX * sumX; >> sumY2 = sumY2 - meanY * sumY; > > You see the lines commented out there? Those are the full forms of the > expressions, which may make more sense. This is centering the data, > making the mean 0. > > This is a simplification based on the observation that, for example, > sumX * meanY = sumY * meanX = n * meanY * meanX. > >> >> i don't really understand how you got these equations. could you explain >> it >> to me? I thought pearson correlation would be like this >> >> E(x_i-meanX)(y_i-meanY) / sdX*sdY > > That's right that's the expression for a population correlation, but > we can really only compute a sample Pearson correlation coefficient, > yes: > > >> for my project I would need to get sample correlation coefficient which >> would be something like this: >> >> sum(x_i-meanX)(y_i-meanY)/(N-1) > > Yeah that's fine too, this is another way of expressing the formula, > though you're missing the two standard deviations in the denominator. > It'll be clearer if I note that the mean of X and Y are 0. > > > > -- > View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26540395.html > Sent from the Mahout User List mailing list archive at Nabble.com. > >