mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Clustering demographic data
Date Fri, 15 Jul 2011 20:16:42 GMT
A typical work-flow for this is to define a disjoint set of demographic
groups and then train a classifier that has access to user actions and
"free" geo-demographic data such as IP, geo-IP, time of day and email
domain.  If you have meta-data from the actions, then you can augment these
variables by joining the action history to the meta-data and including that
in your training data.

Once you have the training data, I would do the standard sort of exploratory
data analysis using a tool like R.  If you verify with R that relatively
simple models show predictive lift, then you can switch to training with
Mahout to get a deployable model.  R is great for agile, interactive
analysis.  Mahout is great for scaling and deployability.  Use both.

On Fri, Jul 15, 2011 at 1:07 PM, Clive Cox <> wrote:

> Hi,
>  If one has an implicit dataset of users and actions (purchases, page
> clicks for example) and also has demographics for those users (age,
> gender, location etc). Does Mahout have any algorithms that could be
> used to cluster users/actions by the user's demographics? So one could
> derive information on certain user groups doing certain actions.
> I suppose one can use the CF algorithms/Matrix factorization on just the
> user/action data and then map that back onto the demographics, and then
> try to see if that maps onto any significant demographic group?
> But are there clustering algorithms in mahout that work directly on the
> demographic data in this situation?
>  Thanks,
>  Clive

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message