Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mahout-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <24364711.post@talk.nabble.com>
Date: Mon, 6 Jul 2009 16:36:57 -0700 (PDT)
From: charlysf <charles.ruelle@gmail.com>
To: mahout-user@lucene.apache.org
Subject: Compute similarities for an hudge quantity of data
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hello,

I currently working on a small database, I understand that, when I need the
similarity between users, it's basically the compute between all pairs of
users.

It's that ? or it's better ?
If it's that, how can I expect a quick compute for 1 million rows ? 

I don't see what is the difference between asking for the neighborhood, to
compute the similarity for all pairs of users.

Because I thought, something could be interesting :
Make some clusters of users, and only compute the similarity between users
in my cluster.

Thanks
-- 
View this message in context: http://www.nabble.com/Compute-similarities-for-an-hudge-quantity-of-data-tp24364711p24364711.html
Sent from the Mahout User List mailing list archive at Nabble.com.