mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <>
Subject Re: user behavior based Click thru prediction
Date Thu, 19 Nov 2009 06:37:30 GMT
Sorry, resending from the correct email address.


Thanks for pitching in.  Ordering is extremely important indeed.

On Thu, Nov 19, 2009 at 12:56 AM, Ted Dunning <> wrote:
> If you want to preserve some ordering ifnormation, then you have a bit more
> of a problem.  The same basic idea can work where you model your data as a
> mixture density over sequence models.  Once you do that, then the mixture
> parameters make a reasonable space to cluster in.  If you have some kind of
> sequence model then the dirichlet process code currently in Mahout can be
> used to do your clustering.

Dont they ( hidden-variable-mixture-models) contradict De Finetti's
basic exchangibility theorem. Unless you are treating each sequence
itself as a term ( which I think  is probably what you are referring
to ) and doing sampling on them. In that case how am I creating
documents ?

> There is probably one too many if's in the previous paragraph for you to be
> happy with it.
> Can you say something more about your sequences?  Can you say something
> about your resources?  Do you have a good sequence model?

Basically I want to cluster user's browsing behavior. And see what are
the dominant  browsing  paths for a particular user. For example :
portal->sports->ad-click->movies->ad-click->ad-click etc.
Would also appreciate your thoughts on  Suffix-Tree-Clustering based
approaches, which I have been contemplating. Meanwhile there seems to
be lot  more work done for bioinformatics than text/web-mining  in
Sequence Clustering.


View raw message