Mailing-List: contact user-help@mahout.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@mahout.apache.org
Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates
 72.14.220.155 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=hdDp2Kw7wrwvpCh0ghF+cuQ76DSkww9KHoQ5FXjHBWBQr6MzC3WOFqVy0ARImwNNEF
         H44IeQ+YBRua5HD7V2NAFe1qvyU5v3LNAuAD+ukx9juXhpkMPaiJKxvhF/eTyK13GJpI
         4csPGxjCz1RqUQR07eAtjmsJYtnFl731BTmvE=
MIME-Version: 1.0
In-Reply-To: <588792.62432.qm@web112017.mail.gq1.yahoo.com>
References: <AANLkTikX2eOxHOpjiNCOLCJyVXKlaJjNfo_npLPT8ag-@mail.gmail.com>
	 <588792.62432.qm@web112017.mail.gq1.yahoo.com>
Date: Tue, 11 May 2010 22:08:27 +0100
Message-ID: <AANLkTinsuUKYdEPZIqxloqvvtZQ_KG1XGSvdBj25_de6@mail.gmail.com>
Subject: Re: RecommenderJob output
From: Sean Owen <srowen@gmail.com>
To: user@mahout.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Can you update it while it's running? Not really. It's a multi-phase
batch job and I don't think you could meaningfully change it on the
fly.

Do you need to run the whole thing every time? No, not at all. Phase 1
(item IDs to item indices) doesn't need to run every time, nor does
phase 3 (count co-occurrence). It's OK if these are a little out of
date. Phase 2 is user vector generation; while I didn't write any
ability to simply append a new user vector to its output, it's easy to
write. So you don't have to run that every time.

Phase 4 and 5 are really where the recommendation happens. Those go
together. You can limit which users it processes though with a file of
user IDs, --usersFile.

I'd say the core job is nearing maturity -- think it's tuned and
debugged. But these kind of practical hooks, like being able to
incrementally update aspects of the pipeline, are exactly what's
needed next. I'd welcome your input and patches in this regard.

Sean


On Tue, May 11, 2010 at 10:00 PM, First Qaxy <qaxyf@yahoo.ca> wrote:
> One question on the recommendation lifecycle: once a RecommendationJob is=
 being run with the intermediate/temp model being created what is the proce=
ss of maintaining it? Can I update it or parts of it to reflect new data?
> For example if I have a new user or new preferences for an existing user =
that I want to compute recommendation for can I do that by incrementally up=
date the internal model and regenerate only recommendations for the user th=
at I'm interested in?
>
> Thanks.
> -qf
> --- On Tue, 5/11/10, Sean Owen <srowen@gmail.com> wrote:
>
> From: Sean Owen <srowen@gmail.com>
> Subject: Re: RecommenderJob output
> To: user@mahout.apache.org
> Cc: mahout-user@lucene.apache.org
> Received: Tuesday, May 11, 2010, 3:55 AM
>
> I just committed more of my local changes, since I'm actively
> improving and fixing things here.
>
> My output looks more reasonable:
>
> 101=C2=A0 =C2=A0=C2=A0=C2=A0[1015:4.0,1021:3.0,1020:3.0]
> 102=C2=A0 =C2=A0=C2=A0=C2=A0[1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.=
0]
> 103=C2=A0 =C2=A0=C2=A0=C2=A0[1005:12.0,1021:3.0,1015:3.0,1020:3.0]
> 105=C2=A0 =C2=A0=C2=A0=C2=A0[1005:14.0,1021:3.0,1020:3.0]
> 106=C2=A0 =C2=A0=C2=A0=C2=A0[1005:12.0,1021:4.0,1015:3.0]
>
> So you might just try the code from head. booleanData doesn't really
> affect the output, it just enables optimizations for this case.
>
>
>