From mahout-user-return-468-apmail-lucene-mahout-user-archive=lucene.apache.org@lucene.apache.org Tue Apr 21 06:04:53 2009 Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 1210 invoked from network); 21 Apr 2009 06:04:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Apr 2009 06:04:53 -0000 Received: (qmail 76586 invoked by uid 500); 21 Apr 2009 06:04:52 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 76516 invoked by uid 500); 21 Apr 2009 06:04:52 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 76506 invoked by uid 99); 21 Apr 2009 06:04:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 06:04:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of recom.team@gmail.com designates 74.125.78.27 as permitted sender) Received: from [74.125.78.27] (HELO ey-out-2122.google.com) (74.125.78.27) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 06:04:45 +0000 Received: by ey-out-2122.google.com with SMTP id 22so188298eye.53 for ; Mon, 20 Apr 2009 23:04:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=uJMM3t5WQzSX2G2m8KB/gfQO2nT7yJi0eMDgvnL8pBo=; b=vlaUipCHfctly5qLZVUbnPvAf2la129BziO6SZqRTqMXtJT/mqHeeXz6ssvJmWewph GU75JGAIGO00qWt1+tGn4tzYHxMNpEhsdTcVoLwfhujwSaQrYx+wycE7FAs56s8UC06S 6TyXQRNZARkGLvhh0c7ng00C6NkrYuAw3GIVQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=aAENf96WBqgSB80k3HGY+yFGyCbRA79BMvzFwNJ0B4IjlriELLbMo7otU8EJswsejM o7lTPYFbJkrZMPUM3TzmScPun592CDd/wamRLYSby+3fJTaOdDY+8g1NiGekNOe9KvN5 SpOKwmDIVkDNYWixzV9pQtVJxRe0gKWKezYSk= MIME-Version: 1.0 Received: by 10.216.13.209 with SMTP id b59mr576437web.44.1240293864004; Mon, 20 Apr 2009 23:04:24 -0700 (PDT) In-Reply-To: References: Date: Mon, 20 Apr 2009 23:04:23 -0700 Message-ID: Subject: Re: Taste as External Web Server From: Matthew Roberson To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64be2642e5c0204680a6b26 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64be2642e5c0204680a6b26 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Yes, I am planning on implementing something along the lines of (3). As of now, the data will be persisting in a .txt file, but will prob move to mySQL in the near future. Essentially, I would like to update/refresh the datamodel contained in memory during each successive HTTP request and only upload/refresh from the file periodically, e.g. rebooting the web service. On Mon, Apr 20, 2009 at 10:17 PM, Sean Owen wrote: > So there are two important components here, your Recommender and your > DataModel. > > > DataModels should always have the most up-to-date data about your > domain -- they don't cache or anything (well... FileDataModel reads > into memory because it is just not efficient to seek through a file > for data every time). So yes you want new information to immediately > update the DataModel if you can. > > Yes calling refresh(null) will cause FileDataModel to reload all the > data from the file. I agree it does not sound efficient to just use > this, but let me make a couple related points: > > 1) You can push updates to the file without re-pushing the whole file. > If your main data file is /foo/data.txt.gz, you can push a file like > /foo/data.update1.txt.gz next to it, and that data will be read after > the main file and override what is in the main file. However, it is > still not efficient to push a small file and reload on every update. I > would consider this only if you are willing to batch updates and push > them periodically instead. > > 2) You probably want to persist the data you are receiving, maybe in a > database? if the data already exists in a database, you can use > something like MySQLJDBCDataModel instead to read from there instead > of a file. > > 3) Or, I imagine you are persisting this data somehow, maybe not in a > database. You can always write a custom DataModel based on that, again > rather than also updating a file. If you are considering updating the > data structures you see in FileDataModel -- I think you are going down > this road. I might suggest you just copy-and-paste it and toss the > parts you don't want, add logic you need. > > 4) Right now FileDataModel.{set,remove}Preference() throws an > exception since these are not supported -- the implementation is > read-only. I could change this to make this methods update the > in-memory representation -- but it would not change the underlying > file, and any such updates would be lost on the next reload. Still if > it helps meet your needs I can make that change. > > > The Recommender on the other hand, I would not refresh on every > request -- certainly not slope-one, as this algorithm needs a lot of > preprocessing. Instead I would refresh it periodically -- once an > hour, day -- whatever meets your performance / freshness goals. > > > On Tue, Apr 21, 2009 at 6:03 AM, Matthew Roberson > wrote: > > btw using FileDataModel and RecommenderServlet to run slope one > Recommender > > as web service. > > > > I wanted to update the datamodel for each http request as more user > > preference data will be generated with each new request. > > > > Would this be handled by a call to the reload() function contained within > > FileDataModel.java for each HTTP request? > > > > It appears that this is the function call that begins the process of > > updating the datamodel via a call to processfile(). > > > > Also, the reload function requires all ratings files to be uploaded to > > update the datamodel. I assume this is done because the Map containing > users > > and their preferences is local only to the processfile() method. I was > > planning on making this Map global so that I can update the datamodel > > without having to upload all the ratings files. Do you see any pitfalls > in > > this plan? > --0016e64be2642e5c0204680a6b26--