Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 75975 invoked from network); 21 Apr 2009 05:17:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Apr 2009 05:17:34 -0000 Received: (qmail 41314 invoked by uid 500); 21 Apr 2009 05:17:34 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 41226 invoked by uid 500); 21 Apr 2009 05:17:34 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 41216 invoked by uid 99); 21 Apr 2009 05:17:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 05:17:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.219.179 as permitted sender) Received: from [209.85.219.179] (HELO mail-ew0-f179.google.com) (209.85.219.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 05:17:25 +0000 Received: by ewy27 with SMTP id 27so2307232ewy.5 for ; Mon, 20 Apr 2009 22:17:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=gSa6wMRS4893aVrtrJdVkx9dKn1Yc1W6nfdwUkv1fTU=; b=W+unMIf+UtOqM+gbdmjuAUrz+XBefVNJKzpAW7dVe+G7KKY4eaLHyp9UnORQUasHUo x7NPRBR0JvVnMv7zhkS3+H3LGNSsDd/RGxc8HKZPwGd86XbtBDVIbyuA7Vt4D5MlzTn8 qGozEoMLUzQh0Cz3ulsvQngDIL84C/qOgF0T4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=KTIUWX1jXJ1nMrYgnjgd0uBJWUr5EKQswSKzJc2h36QkNVqMDb7bsLQsflkz+//w4f f8S4ecPIuf05DYUQkJ5TAXrvVITUFpb5g7slYUpbbellbdecwN2mLV3SKdNmAHLNkRPD 3lQltra0UzcxVLmrS8w+5haUtm+M6Qmv6EVbM= MIME-Version: 1.0 Received: by 10.216.29.210 with SMTP id i60mr571152wea.84.1240291025270; Mon, 20 Apr 2009 22:17:05 -0700 (PDT) In-Reply-To: References: Date: Tue, 21 Apr 2009 06:17:05 +0100 Message-ID: Subject: Re: Taste as External Web Server From: Sean Owen To: mahout-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org So there are two important components here, your Recommender and your DataM= odel. DataModels should always have the most up-to-date data about your domain -- they don't cache or anything (well... FileDataModel reads into memory because it is just not efficient to seek through a file for data every time). So yes you want new information to immediately update the DataModel if you can. Yes calling refresh(null) will cause FileDataModel to reload all the data from the file. I agree it does not sound efficient to just use this, but let me make a couple related points: 1) You can push updates to the file without re-pushing the whole file. If your main data file is /foo/data.txt.gz, you can push a file like /foo/data.update1.txt.gz next to it, and that data will be read after the main file and override what is in the main file. However, it is still not efficient to push a small file and reload on every update. I would consider this only if you are willing to batch updates and push them periodically instead. 2) You probably want to persist the data you are receiving, maybe in a database? if the data already exists in a database, you can use something like MySQLJDBCDataModel instead to read from there instead of a file. 3) Or, I imagine you are persisting this data somehow, maybe not in a database. You can always write a custom DataModel based on that, again rather than also updating a file. If you are considering updating the data structures you see in FileDataModel -- I think you are going down this road. I might suggest you just copy-and-paste it and toss the parts you don't want, add logic you need. 4) Right now FileDataModel.{set,remove}Preference() throws an exception since these are not supported -- the implementation is read-only. I could change this to make this methods update the in-memory representation -- but it would not change the underlying file, and any such updates would be lost on the next reload. Still if it helps meet your needs I can make that change. The Recommender on the other hand, I would not refresh on every request -- certainly not slope-one, as this algorithm needs a lot of preprocessing. Instead I would refresh it periodically -- once an hour, day -- whatever meets your performance / freshness goals. On Tue, Apr 21, 2009 at 6:03 AM, Matthew Roberson wr= ote: > btw using FileDataModel and RecommenderServlet to run slope one Recommend= er > as web service. > > I wanted to update the datamodel for each http request as more user > preference data will be generated with each new request. > > Would this be handled by a call to the reload() function contained within > FileDataModel.java for each HTTP request? > > It appears that this is the function call that begins the process of > updating the datamodel via a call to processfile(). > > Also, the reload function requires all ratings files to be uploaded to > update the datamodel. I assume this is done because the Map containing us= ers > and their preferences is local only to the processfile() method. =C2=A0I = was > planning on making this Map global so that I can update the datamodel > without having to upload all the ratings files. =C2=A0Do you see any pitf= alls in > this plan?