Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F01F26EA4 for ; Fri, 15 Jul 2011 14:21:04 +0000 (UTC) Received: (qmail 12282 invoked by uid 500); 15 Jul 2011 14:21:04 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 12192 invoked by uid 500); 15 Jul 2011 14:21:03 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 12178 invoked by uid 99); 15 Jul 2011 14:21:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2011 14:21:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of marco.turchi@gmail.com designates 209.85.216.177 as permitted sender) Received: from [209.85.216.177] (HELO mail-qy0-f177.google.com) (209.85.216.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2011 14:20:55 +0000 Received: by qyk7 with SMTP id 7so1328675qyk.1 for ; Fri, 15 Jul 2011 07:20:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=7sSl7f0YZAgSv15x3vP9vzs9mv7w3tDxEgRcf1mBOsk=; b=igw7N/ANuNVOh/QAOdDOxTfsnEgD9jKiipLuUKoPSw1yzVOJnvMPnNzUiCDnw8UuHd ReLOeKAhHv7CKLW3CpjEhfboxYVB7EPMbqOKO2qgFuXoAciQj9+DqXRFuz8dbMDz1grI 19okFd88w3m5THbaiB4ITQnOf5+kKuJTiU5wQ= Received: by 10.229.51.199 with SMTP id e7mr2791481qcg.34.1310739634211; Fri, 15 Jul 2011 07:20:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.236.72 with HTTP; Fri, 15 Jul 2011 07:20:14 -0700 (PDT) In-Reply-To: References: From: marco turchi Date: Fri, 15 Jul 2011 16:20:14 +0200 Message-ID: Subject: Re: Similarity between sparse vectors To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=0016364ec9404a4db404a81c5aad X-Virus-Checked: Checked by ClamAV on apache.org --0016364ec9404a4db404a81c5aad Content-Type: text/plain; charset=ISO-8859-1 Dear Sean, thanks a lot for the advices, everything is working perfectly! Cheers Marco On Fri, Jul 15, 2011 at 2:15 PM, Sean Owen wrote: > Cardinality should be set to whatever the logical dimension of the > vector is -- it shouldn't be arbitrary. It's not like an "initial > size" of a list. If your'e dealing with vectors that have a > potentially unbounded maximum dimension, use Integer.MAX_VALUE. > > As the name suggests, the implementation you use is for sparse > vectors, meaning dimensions without value have no representation. It > would be a pretty poor sparse implementation if these were not true. > So, no, the cardinality has no direct effect on memory. > > On Fri, Jul 15, 2011 at 1:00 PM, marco turchi > wrote: > > Hi > > thanks a lot > > > > I have also another problem ( :-) ). As I wrote in the previous email, > I'm > > using the RandomAccessSparseVector representation to store sparse > vectors. I > > need to sum some of them together, so I use the method plus but it seems > > that it requires the same vector cardinality. I set the initial > cardinality > > of each vector to a big value, but I was wondering if it is a huge waste > of > > memory or everything is optimized inside the RandomAccessSparseVector > > class. In case, is there an optimal way to set the cardinality? > > > > Thanks again > > Marco > > > > On Fri, Jul 15, 2011 at 1:50 PM, Sean Owen wrote: > > > >> This is simply Euclidean distance squared. Take the square root if you > >> need the simple Euclidean distance. > >> > >> On Fri, Jul 15, 2011 at 12:36 PM, marco turchi > >> wrote: > >> > Dear All, > >> > I'm a newcomer in Mahout and I'm try to compute the cosine similarity > >> > between two sparse vectors. > >> > I have loaded them using the class RandomAccessSparseVector. I notice > >> that > >> > there is a method called: getDistanceSquared. Which kind of vector > >> distance > >> > is implemented? Is there a method to compute directly this distance? > >> > > >> > Thanks a lot in advance for your help > >> > Marco > >> > > >> > > > --0016364ec9404a4db404a81c5aad--