mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Pujari <>
Subject Re: [jira] Created: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing
Date Tue, 23 Jun 2009 22:19:49 GMT
thanks all. This is a very valuable info for a beginner. Does Mahout
requires prefernce values in binary values in the range -1 to +1 or it can
take any range like from 0 to 10 (say).

On Tue, Jun 23, 2009 at 2:12 PM, Ted Dunning <> wrote:

> This is what is traditionally done, but it is distinctly sub-optimal in
> many
> ways.  The most serious problem is that there is a heuristic decision that
> says what is important what is not.
> A preferable (and as far as I know never used or implemented) approach
> would
> be to build a real model that includes factors that actually help predict
> the desired outcome.  Methods to do this might include:
> a) LLR feature selection from several behavior types followed by IDF
> weighted scoring.   I have used this with additional follow on steps in
> attrition and loss models for insurance with very good results, but never
> used it in recommendations.  The basic idea in the attrition and loss
> models
> was to develop positive and negative indicator sets for each outcome and
> then cluster in the space of indicator scores.  Finally, we built ANN
> models
> over the variables formed by distances to cluster centroids.   For
> recommendations, this would mean building positive and negative feature
> sets
> for all items for each kind of behavior.  I would expect little gain from
> negative scores but would still use them.  With positive only sets, this
> reduces (almost) to the sum of cooccurrence scores done in isolation on
> each
> kind of input.
> b) shared latent variable reductions across multiple behavior types.  For
> SVD or similar decomposition based techniques, this is equivalent to
> reducing column adjoined matrices for the independent behaviors.  Then, if
> you have only one kind of information, you can use the SVD to fill in the
> other, missing, information.
> c) probabilistic latent variable approaches.  For LDA and such, you can put
> all of the behavioral information together and use the model to predict
> missing observations in the standard Bayesian kind of way.  This is similar
> to (b), but much better founded.
> On Tue, Jun 23, 2009 at 12:23 PM, Sean Owen <> wrote:
> > For example, you could write a script that combines rating,
> > purchase history, demographics, in some way that you think is useful,
> > to produce 'preference' values.
> >
> --
> Ted Dunning, CTO
> DeepDyve
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> 858-414-0013 (m)
> 408-773-0220 (fax)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message