mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Roberson <recom.t...@gmail.com>
Subject Re: Taste as External Web Server
Date Tue, 21 Apr 2009 06:04:23 GMT
Yes,  I am planning on implementing something along the lines of (3).

As of now, the data will be persisting in a .txt file, but will prob move to
mySQL in the near future.

Essentially, I would like to update/refresh the datamodel contained in
memory during each successive HTTP request and only upload/refresh from the
file periodically, e.g. rebooting the web service.


On Mon, Apr 20, 2009 at 10:17 PM, Sean Owen <srowen@gmail.com> wrote:

> So there are two important components here, your Recommender and your
> DataModel.
>
>
> DataModels should always have the most up-to-date data about your
> domain -- they don't cache or anything (well... FileDataModel reads
> into memory because it is just not efficient to seek through a file
> for data every time). So yes you want new information to immediately
> update the DataModel if you can.
>
> Yes calling refresh(null) will cause FileDataModel to reload all the
> data from the file. I agree it does not sound efficient to just use
> this, but let me make a couple related points:
>
> 1) You can push updates to the file without re-pushing the whole file.
> If your main data file is /foo/data.txt.gz, you can push a file like
> /foo/data.update1.txt.gz next to it, and that data will be read after
> the main file and override what is in the main file. However, it is
> still not efficient to push a small file and reload on every update. I
> would consider this only if you are willing to batch updates and push
> them periodically instead.
>
> 2) You probably want to persist the data you are receiving, maybe in a
> database? if the data already exists in a database, you can use
> something like MySQLJDBCDataModel instead to read from there instead
> of a file.
>
> 3) Or, I imagine you are persisting this data somehow, maybe not in a
> database. You can always write a custom DataModel based on that, again
> rather than also updating a file. If you are considering updating the
> data structures you see in FileDataModel -- I think you are going down
> this road. I might suggest you just copy-and-paste it and toss the
> parts you don't want, add logic you need.
>
> 4) Right now FileDataModel.{set,remove}Preference() throws an
> exception since these are not supported -- the implementation is
> read-only. I could change this to make this methods update the
> in-memory representation -- but it would not change the underlying
> file, and any such updates would be lost on the next reload. Still if
> it helps meet your needs I can make that change.
>
>
> The Recommender on the other hand, I would not refresh on every
> request -- certainly not slope-one, as this algorithm needs a lot of
> preprocessing. Instead I would refresh it periodically -- once an
> hour, day -- whatever meets your performance / freshness goals.
>
>
> On Tue, Apr 21, 2009 at 6:03 AM, Matthew Roberson <recom.team@gmail.com>
> wrote:
> > btw using FileDataModel and RecommenderServlet to run slope one
> Recommender
> > as web service.
> >
> > I wanted to update the datamodel for each http request as more user
> > preference data will be generated with each new request.
> >
> > Would this be handled by a call to the reload() function contained within
> > FileDataModel.java for each HTTP request?
> >
> > It appears that this is the function call that begins the process of
> > updating the datamodel via a call to processfile().
> >
> > Also, the reload function requires all ratings files to be uploaded to
> > update the datamodel. I assume this is done because the Map containing
> users
> > and their preferences is local only to the processfile() method.  I was
> > planning on making this Map global so that I can update the datamodel
> > without having to upload all the ratings files.  Do you see any pitfalls
> in
> > this plan?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message