Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 35567 invoked from network); 22 Apr 2009 12:06:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Apr 2009 12:06:19 -0000 Received: (qmail 26427 invoked by uid 500); 22 Apr 2009 12:06:19 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 26364 invoked by uid 500); 22 Apr 2009 12:06:18 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 26354 invoked by uid 99); 22 Apr 2009 12:06:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2009 12:06:18 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=MIME_QP_LONG_LINE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of panisson@di.unito.it designates 130.192.156.1 as permitted sender) Received: from [130.192.156.1] (HELO pianeta.di.unito.it) (130.192.156.1) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2009 12:06:09 +0000 Received: from pianeta.di.unito.it (localhost [127.0.0.1]) by pianeta.di.unito.it (INFO-DIP) with ESMTP id n3MC5Wkr020218 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO userCertificateDN= AuthenticatedUser= ) for ; Wed, 22 Apr 2009 14:05:32 +0200 (MEST) Received: (from wwwadm@localhost) by pianeta.di.unito.it (INFODIP) id n3MC5WE9020216 for mahout-user@lucene.apache.org; Wed, 22 Apr 2009 14:05:32 +0200 (MEST) X-Authentication-Warning: pianeta.di.unito.it: wwwadm set sender to panisson@di.unito.it using -f Received: from pcschifanella (pcschifanella [130.192.156.49]) by www.di.unito.it (Horde MIME library) with HTTP; Wed, 22 Apr 2009 14:05:32 +0200 Message-ID: <20090422140532.cl6jnjm2o0c8oksg@www.di.unito.it> Date: Wed, 22 Apr 2009 14:05:32 +0200 From: Andre Panisson To: mahout-user@lucene.apache.org Subject: Re: Considering removing User/Item abstractions References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) X-dipinfo-MailScanner-Information: Please contact Department of Computer Science technical staff for more information X-AntiVirus: Scanned for viruses by VirusFinder @2001-tecnici@di.unito.it - Email Clean X-SpamCheck: not spam (whitelisted), SpamAssassin (score=-3.003, required 3.5, ALL_TRUSTED -1.80, BAYES_00 -2.60, MIME_QP_LONG_LINE 1.40) X-MailScanner-From: panisson@di.unito.it X-Virus-Checked: Checked by ClamAV on apache.org I think removing the User and Item abstractions would be a good idea. =20 The User interface is a bit more complex with the getPreferences =20 methods, but I think it can be easily ported to the DataModel. There =20 will be some impact in the already written code, but I think the =20 benefits are interesting. I dont know if removing the Preference abstraction will bring a better =20 performance. The getPreferences methods are very useful to iterate =20 over the preferences of users and items, and I think it save a lot of =20 lookups if the association user/item is present in a single object. Andr=C3=A9 Citando Sean Owen : > I am considering a somewhat large change to org.apache.mahout.cf.taste cod= e > and would like to solicit feedback from users. > > The change would be to remove the User, Item and Preference > interfaces/abstractions from the code. Everything would proceed in terms o= f > user and item IDs, and preference values instead. > > The reasons for these interfaces originally were, well, it seemed nice. It > also provided a way for implementors to substitute domain-specific > implementations with additional information. > > But there are problems too. > > - Do methods take a User, or user ID? The code is not consistent in this > regard. If User, the caller is forced to look up a User if it only has an > ID. (Conversely, if the caller already has a User, and the method needs a > User, then passing an ID only forces a redundant lookup. I think this is > rarer.) > > - Factory method problem. There are many points in the code where it shoul= d > call to factory methods to generate a User/Item/Preference object since th= e > domain may use specialized implementations instead of GenericUser, etc. At > the moment some methods just assume GenericUser, etc. Fixing this would be= a > bit hard but would more importantly impact performance I think. > > - Object overhead. Holding these extra objects has a cost in memory and > performance. > > The code already really assumes there are nothing but user and item IDs an= d > a pref value. So why not make the core reflect this and gain some simplici= ty > and speed performance? > > I think that domains that need to inject extra information can still do th= is > fine without needing custom User, Item implementations. > > It is just a thought now. Anybody have more? > > Sean >