thanks all. This is a very valuable info for a beginner. Does Mahout
requires prefernce values in binary values in the range 1 to +1 or it can
take any range like from 0 to 10 (say).
thanks,
Pradeep.
On Tue, Jun 23, 2009 at 2:12 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> This is what is traditionally done, but it is distinctly suboptimal in
> many
> ways. The most serious problem is that there is a heuristic decision that
> says what is important what is not.
>
> A preferable (and as far as I know never used or implemented) approach
> would
> be to build a real model that includes factors that actually help predict
> the desired outcome. Methods to do this might include:
>
> a) LLR feature selection from several behavior types followed by IDF
> weighted scoring. I have used this with additional follow on steps in
> attrition and loss models for insurance with very good results, but never
> used it in recommendations. The basic idea in the attrition and loss
> models
> was to develop positive and negative indicator sets for each outcome and
> then cluster in the space of indicator scores. Finally, we built ANN
> models
> over the variables formed by distances to cluster centroids. For
> recommendations, this would mean building positive and negative feature
> sets
> for all items for each kind of behavior. I would expect little gain from
> negative scores but would still use them. With positive only sets, this
> reduces (almost) to the sum of cooccurrence scores done in isolation on
> each
> kind of input.
>
> b) shared latent variable reductions across multiple behavior types. For
> SVD or similar decomposition based techniques, this is equivalent to
> reducing column adjoined matrices for the independent behaviors. Then, if
> you have only one kind of information, you can use the SVD to fill in the
> other, missing, information.
>
> c) probabilistic latent variable approaches. For LDA and such, you can put
> all of the behavioral information together and use the model to predict
> missing observations in the standard Bayesian kind of way. This is similar
> to (b), but much better founded.
>
> On Tue, Jun 23, 2009 at 12:23 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > For example, you could write a script that combines rating,
> > purchase history, demographics, in some way that you think is useful,
> > to produce 'preference' values.
> >
>
>
>
> 
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> http://www.deepdyve.com
> 8584140013 (m)
> 4087730220 (fax)
>
