mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Can Mahout make recommendations while the recommender is being refreshed?
Date Fri, 19 Nov 2010 20:49:33 GMT
In short -- you're mostly right. There's a tradeoff between fill-and-swap
(no service interruption, but needs 2x memory), and a stop-the-world
approach.

FileDataModel usually does an incremental update but will fill-and-swap as
you call it when the main file is updated.
SlopeOneRecommender does a stop-the-world refresh. If you use both I can see
getting into a worst-of-both-worlds situation.

I think the way forward is to edit SlopeOneRecommender. It is the less
mature bit of code. Stop-the-world semantics aren't so good. I think it
would be complex to implement anything but fill-and-swap. But that doubles
peak memory requirements, for a component that is definitely memory bound.

I don't think there's any other way to do a proper refresh. I can think of
ways to do a refresh in-place, but which is not 100% accurate (reload, but
don't throw out the old data at all). Maybe that's reasonable -- I haven't
thought it through much.

Any comments so far?


On Fri, Nov 19, 2010 at 8:18 PM, Jordan, Eric <eric.jordan@navteq.com>wrote:

> Hi,
>
> We are developing a system that issues recommendations in real-time based
> on data from a main data file (say, /tmp/data.lst) together with daily
> update files (/tmp/data.1.lst, /tmp/data.2.lst, etc.)  We call refresh() on
> the SlopeOne recommender when the daily files are updated.  We are concerned
> about the performance while the daily update files are being loaded, and are
> interested in any feedback on what to expect.
>
> I've been looking through the Mahout code to determine whether Mahout can
> make recommendations while the (SlopeOne) recommender is being refreshed.
>
> From what I can tell, the call to refresh() ends up in
> MemoryDiffStorage.buildAverageDiffs(), where the system acquires a write
> lock.
> This would stall any calls to MemoryDiffStorage.getDiffs(), where the
> system acquires a read lock.
> So, it looks to me like the MemoryDiffStorage is taking a locking-based
> approach, rather than a fill-and-swap approach.
>
> On the other hand, FileDataModel has a reload() method with:
>                 delegate = buildModel()
> Which looks like a fill-and-swap based approach that would allow the system
> to seamlessly continue to serve recommendations even while the model is
> being refreshed.
>
> Is this correct?  If so, should we be concerned about the locking of the
> MemoryDiffStorage?  Are there any workarounds?
>
> Thanks in advance!
>
> Regards,
>
> Eric
>
>
>
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message