mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Taste as External Web Server
Date Tue, 21 Apr 2009 05:17:05 GMT
So there are two important components here, your Recommender and your DataModel.

DataModels should always have the most up-to-date data about your
domain -- they don't cache or anything (well... FileDataModel reads
into memory because it is just not efficient to seek through a file
for data every time). So yes you want new information to immediately
update the DataModel if you can.

Yes calling refresh(null) will cause FileDataModel to reload all the
data from the file. I agree it does not sound efficient to just use
this, but let me make a couple related points:

1) You can push updates to the file without re-pushing the whole file.
If your main data file is /foo/data.txt.gz, you can push a file like
/foo/data.update1.txt.gz next to it, and that data will be read after
the main file and override what is in the main file. However, it is
still not efficient to push a small file and reload on every update. I
would consider this only if you are willing to batch updates and push
them periodically instead.

2) You probably want to persist the data you are receiving, maybe in a
database? if the data already exists in a database, you can use
something like MySQLJDBCDataModel instead to read from there instead
of a file.

3) Or, I imagine you are persisting this data somehow, maybe not in a
database. You can always write a custom DataModel based on that, again
rather than also updating a file. If you are considering updating the
data structures you see in FileDataModel -- I think you are going down
this road. I might suggest you just copy-and-paste it and toss the
parts you don't want, add logic you need.

4) Right now FileDataModel.{set,remove}Preference() throws an
exception since these are not supported -- the implementation is
read-only. I could change this to make this methods update the
in-memory representation -- but it would not change the underlying
file, and any such updates would be lost on the next reload. Still if
it helps meet your needs I can make that change.

The Recommender on the other hand, I would not refresh on every
request -- certainly not slope-one, as this algorithm needs a lot of
preprocessing. Instead I would refresh it periodically -- once an
hour, day -- whatever meets your performance / freshness goals.

On Tue, Apr 21, 2009 at 6:03 AM, Matthew Roberson <> wrote:
> btw using FileDataModel and RecommenderServlet to run slope one Recommender
> as web service.
> I wanted to update the datamodel for each http request as more user
> preference data will be generated with each new request.
> Would this be handled by a call to the reload() function contained within
> for each HTTP request?
> It appears that this is the function call that begins the process of
> updating the datamodel via a call to processfile().
> Also, the reload function requires all ratings files to be uploaded to
> update the datamodel. I assume this is done because the Map containing users
> and their preferences is local only to the processfile() method.  I was
> planning on making this Map global so that I can update the datamodel
> without having to upload all the ratings files.  Do you see any pitfalls in
> this plan?

View raw message