Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 143AFD095 for ; Fri, 28 Sep 2012 16:08:38 +0000 (UTC) Received: (qmail 38530 invoked by uid 500); 28 Sep 2012 16:08:36 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 38468 invoked by uid 500); 28 Sep 2012 16:08:36 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 38460 invoked by uid 99); 28 Sep 2012 16:08:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2012 16:08:36 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcamu-mahout-user@m.gmane.org designates 80.91.229.3 as permitted sender) Received: from [80.91.229.3] (HELO plane.gmane.org) (80.91.229.3) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2012 16:08:28 +0000 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1THd6l-0006tx-Sl for user@mahout.apache.org; Fri, 28 Sep 2012 18:08:08 +0200 Received: from ABTS-KK-Dynamic-079.8.167.122.airtelbroadband.in ([ABTS-KK-Dynamic-079.8.167.122.airtelbroadband.in]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 28 Sep 2012 18:08:07 +0200 Received: from abhishekroy8 by ABTS-KK-Dynamic-079.8.167.122.airtelbroadband.in with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 28 Sep 2012 18:08:07 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@mahout.apache.org From: Abhishek Roy Subject: Re: Custom Item Similarity :datamodel not sure Date: Fri, 28 Sep 2012 16:07:41 +0000 (UTC) Lines: 42 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 122.167.8.79 (Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4) Sean Owen gmail.com> writes: > > But what was your input? Item item similarity? Then you already had item > item similarity. And what you compute from that method is probably not > meaningful. > > You don't have a recommender problem so there is no question of what to > feed to a recommender. Don't use it at all. You already have all you need > in your ItemSimilarity. > On Sep 27, 2012 7:50 PM, "Abhishek Roy" gmail.com> wrote: > > > > > Thanks Sean. I get your point. Will try incorporating that. > > Earlier, as I mentioned, for a small item count(<5000), the > > input(datamodel) to > > the recommender was nC2 item-item pairs(tried to feed uniform preference > > for > > each item to every other item), without the rating field, and then called > > recommender.mostSimilarItems() to get the list. nC2 works, but is not > > scalable. > > It worked well as the recommendations were the similar items(that works > > for me > > now). > > Although am digging through the code to see what least input I can give, > > any > > meaningful suggestion for data input would be awesome. > > > > > >Thanks for your inputs Sean. I implemented the top N(most similar items) looking at and reusing the most SimilatItems available. Works fine. Now, scale in action ! testing with a set of 200,000 items, computing the most similar items for 1 item takes around 20 secs. My approach is to pre-compute most similar for all the 200,000 items. I am not looking at Hadoop for now (2000 item base currently). I know I can reduce my data size for similarity computation. What are my options ? > > >