Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 54602 invoked from network); 1 May 2009 14:01:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 May 2009 14:01:04 -0000 Received: (qmail 41149 invoked by uid 500); 1 May 2009 14:01:03 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 41082 invoked by uid 500); 1 May 2009 14:01:03 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 41071 invoked by uid 99); 1 May 2009 14:01:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 May 2009 14:01:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.219.179 as permitted sender) Received: from [209.85.219.179] (HELO mail-ew0-f179.google.com) (209.85.219.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 May 2009 14:00:55 +0000 Received: by ewy27 with SMTP id 27so2635357ewy.5 for ; Fri, 01 May 2009 07:00:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=e8f//9kCpvl7k8P/oaq1AhzWgosqeRsYjxeOcH7uYHg=; b=cwOBFlQ+tFZsDZxtBt/3LSrwVaGM93I9nDxlGvIyKapQdbf6VzQdKlCeFj4gT4tar/ 1AC1gv7nDWYxPyDaZPH99efmkie/agj7pntndyoFrQ9TXQydjFTG/Ff41YUh0vbIQx/D 2RO6K/h2J6jG0JBI4kHG9PwMuY8u7/cm2G+dA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=fbQ+Y4PDuO6s1wyDS/xrsDP8X2L1LDVZhjufWO+hWH/GlUVvBzTBkyG4llTMs2Q6GT 7IEqeNVLVBr/Z4zDQycFotn1Tr5bCYRy0Yp6NPv+5wknnBEpg7qxQd5YXezDXk6jti/m KHf9XrFtTSpHNVbMtVM94XjJPoebY1e/rPa5o= MIME-Version: 1.0 Received: by 10.216.53.207 with SMTP id g57mr851870wec.3.1241186434697; Fri, 01 May 2009 07:00:34 -0700 (PDT) In-Reply-To: References: <20090422140532.cl6jnjm2o0c8oksg@www.di.unito.it> Date: Fri, 1 May 2009 15:00:34 +0100 Message-ID: Subject: Re: Considering removing User/Item abstractions From: Sean Owen To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6dee6fc8a49de0468da3c96 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dee6fc8a49de0468da3c96 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit As an update on this thought, I began exploring exactly what the API impact would be. I think there is a big impact for those that subclassed or implemented components, but probably little for clients of the standard implementations. I found one ugly snag when replacing Users and Items with IDs (Objects). These entities need an ordering; Users and Items implement Comparable. When dealing with IDs it is not sufficient to use Object; we must declare IDs as - get this - Comparable>. This turns into a mess. One alternative is to assume all IDs are Strings, since some components like FileDataModel already necessarily assume this. But I don't like it. I am inclined to defer this big change for now since the code is getting new attention as part of Mahout and it is not a good time for a big change. I imagine it will happen sooner or later for performance reasons; the framework is eternally short on performance rather than felxibility so the tradeoffs have to start happening. More thoughts are welcome. Sean On Apr 22, 2009 1:16 PM, "Sean Owen" wrote: Agree about Preference -- there will have to be some kind of object that represents an item-value pair. While the interface probably goes away, the "GenericPreference" class probably stays. Anyone who customized a component probably has to make some non-trivial changes. People that just use standard components actually may have little or no changes. Thanks for the input, I will await more comments. I should also add that performance has become my top priority, as it always seems that the code consumes too much time and memory. So this is a large reason why I favor a change that could simplify and speed up the code and data structures, even if it means reducing flexibility slightly. Sean On Wed, Apr 22, 2009 at 1:05 PM, Andre Panisson wrote: > I think removing t... --0016e6dee6fc8a49de0468da3c96--