mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: RecommenderJob output
Date Tue, 11 May 2010 21:08:27 GMT
Can you update it while it's running? Not really. It's a multi-phase
batch job and I don't think you could meaningfully change it on the

Do you need to run the whole thing every time? No, not at all. Phase 1
(item IDs to item indices) doesn't need to run every time, nor does
phase 3 (count co-occurrence). It's OK if these are a little out of
date. Phase 2 is user vector generation; while I didn't write any
ability to simply append a new user vector to its output, it's easy to
write. So you don't have to run that every time.

Phase 4 and 5 are really where the recommendation happens. Those go
together. You can limit which users it processes though with a file of
user IDs, --usersFile.

I'd say the core job is nearing maturity -- think it's tuned and
debugged. But these kind of practical hooks, like being able to
incrementally update aspects of the pipeline, are exactly what's
needed next. I'd welcome your input and patches in this regard.


On Tue, May 11, 2010 at 10:00 PM, First Qaxy <> wrote:
> One question on the recommendation lifecycle: once a RecommendationJob is being run with
the intermediate/temp model being created what is the process of maintaining it? Can I update
it or parts of it to reflect new data?
> For example if I have a new user or new preferences for an existing user that I want
to compute recommendation for can I do that by incrementally update the internal model and
regenerate only recommendations for the user that I'm interested in?
> Thanks.
> -qf
> --- On Tue, 5/11/10, Sean Owen <> wrote:
> From: Sean Owen <>
> Subject: Re: RecommenderJob output
> To:
> Cc:
> Received: Tuesday, May 11, 2010, 3:55 AM
> I just committed more of my local changes, since I'm actively
> improving and fixing things here.
> My output looks more reasonable:
> 101     [1015:4.0,1021:3.0,1020:3.0]
> 102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
> 103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
> 105     [1005:14.0,1021:3.0,1020:3.0]
> 106     [1005:12.0,1021:4.0,1015:3.0]
> So you might just try the code from head. booleanData doesn't really
> affect the output, it just enables optimizations for this case.

View raw message