In short -- you're mostly right. There's a tradeoff between fill-and-swap
(no service interruption, but needs 2x memory), and a stop-the-world
approach.
FileDataModel usually does an incremental update but will fill-and-swap as
you call it when the main file is updated.
SlopeOneRecommender does a stop-the-world refresh. If you use both I can see
getting into a worst-of-both-worlds situation.
I think the way forward is to edit SlopeOneRecommender. It is the less
mature bit of code. Stop-the-world semantics aren't so good. I think it
would be complex to implement anything but fill-and-swap. But that doubles
peak memory requirements, for a component that is definitely memory bound.
I don't think there's any other way to do a proper refresh. I can think of
ways to do a refresh in-place, but which is not 100% accurate (reload, but
don't throw out the old data at all). Maybe that's reasonable -- I haven't
thought it through much.
Any comments so far?
On Fri, Nov 19, 2010 at 8:18 PM, Jordan, Eric <eric.jordan@navteq.com>wrote:
> Hi,
>
> We are developing a system that issues recommendations in real-time based
> on data from a main data file (say, /tmp/data.lst) together with daily
> update files (/tmp/data.1.lst, /tmp/data.2.lst, etc.) We call refresh() on
> the SlopeOne recommender when the daily files are updated. We are concerned
> about the performance while the daily update files are being loaded, and are
> interested in any feedback on what to expect.
>
> I've been looking through the Mahout code to determine whether Mahout can
> make recommendations while the (SlopeOne) recommender is being refreshed.
>
> From what I can tell, the call to refresh() ends up in
> MemoryDiffStorage.buildAverageDiffs(), where the system acquires a write
> lock.
> This would stall any calls to MemoryDiffStorage.getDiffs(), where the
> system acquires a read lock.
> So, it looks to me like the MemoryDiffStorage is taking a locking-based
> approach, rather than a fill-and-swap approach.
>
> On the other hand, FileDataModel has a reload() method with:
> delegate = buildModel()
> Which looks like a fill-and-swap based approach that would allow the system
> to seamlessly continue to serve recommendations even while the model is
> being refreshed.
>
> Is this correct? If so, should we be concerned about the locking of the
> MemoryDiffStorage? Are there any workarounds?
>
> Thanks in advance!
>
> Regards,
>
> Eric
>
>
>
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>
|