Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 43126 invoked from network); 6 Jul 2009 23:37:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jul 2009 23:37:18 -0000 Received: (qmail 86919 invoked by uid 500); 6 Jul 2009 23:37:28 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 86856 invoked by uid 500); 6 Jul 2009 23:37:28 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 86846 invoked by uid 99); 6 Jul 2009 23:37:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 23:37:28 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 23:37:18 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1MNxk1-0004x2-61 for mahout-user@lucene.apache.org; Mon, 06 Jul 2009 16:36:57 -0700 Message-ID: <24364711.post@talk.nabble.com> Date: Mon, 6 Jul 2009 16:36:57 -0700 (PDT) From: charlysf To: mahout-user@lucene.apache.org Subject: Compute similarities for an hudge quantity of data MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: charles.ruelle@gmail.com X-Virus-Checked: Checked by ClamAV on apache.org Hello, I currently working on a small database, I understand that, when I need the similarity between users, it's basically the compute between all pairs of users. It's that ? or it's better ? If it's that, how can I expect a quick compute for 1 million rows ? I don't see what is the difference between asking for the neighborhood, to compute the similarity for all pairs of users. Because I thought, something could be interesting : Make some clusters of users, and only compute the similarity between users in my cluster. Thanks -- View this message in context: http://www.nabble.com/Compute-similarities-for-an-hudge-quantity-of-data-tp24364711p24364711.html Sent from the Mahout User List mailing list archive at Nabble.com.