mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivek Khanna <vivekkha...@hotmail.com>
Subject RE: User/Items Reco Engine clustering
Date Thu, 24 Jun 2010 13:14:57 GMT

Another way to look at the problem is to consider user purchases/actions as features describing
a user in a vector space. Then the problem is reduced to finding users similar to each other
based on this feature set.

Clustering would be overly complex in my humble opinion.

I agree with Sean that the Lucene based construction as you describe it Jay, is item-based
and not user-based. 

Hope this helps.

> Date: Wed, 23 Jun 2010 08:59:03 +0100
> Subject: Re: User/Items Reco Engine clustering
> From: srowen@gmail.com
> To: dev@mahout.apache.org
> 
> To me you're just describing user-based recommendation. You find a
> neighborhood of similar users, then examine their items, and recommend
> from those by taking a weighted average of the neighborhood's
> preferences.
> 
> Your Lucene-based construction then sounds like item-based
> recommendation. Find items similar to what the user prefers and
> recommend based on a weighted average, again.
> 
> Do I have that right?
> 
> And then, do you need a Hadoop-based implementation using SequenceFiles?
> What kind of data size are you looking at?
> 
> On Wed, Jun 23, 2010 at 12:49 AM, Jay Sellers <jaysellers@gmail.com> wrote:
> > Thanks Vivek,
> > We do not have predefined clusters/groups. We expect the groups to mutate as
> > more history (data) is accumulated.  A simple use case is as follows:
> > John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of
> > boots.
> > Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket
> > watch.
> > Larry has viewed a pair of jeans, a cowboy hat and a red shirt.
> >
> > When we send Larry and his items into our reco engine, we would expect a
> > pair of boots and a pocket watch to be recommended.  We'd expect this
> > because we've determined that John and Scott are 'like' Larry and thus are
> > in the same cluster.
> >
> > Again, we fully expect the cluster members to change, as user/item data
> > accumulates.
> >
> > On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna <vivekkhanna@hotmail.com>wrote:
> >
> >>
> >> Hi,
> >>
> >>
> >>
> >> For your clustering/grouping, what is your expectation? Do you have
> >> pre-defined clusters/groups that you want to cluster the items within those,
> >> or do you envision a system where clusters/groups will change and evolve as
> >> the data changes?
> >>
> >>
> >>
> >> In each case, it seems you are looking for unsupervised approaches. Is that
> >> correct?
> >>
> >>
> >>
> >> I am new to this email list, so pardon my ignorance, but from what work I
> >> have done in the past with IR, ML (clustering, More like this,
> >> categorization, topic detection etc.), my advice to you is to identify your
> >> requirements, use cases and page flow interactions as the first step. :)
> >>
> >>
> >>
> >> Hope this helps!
> >>
> >> Vivek.
> >>
> >> > Date: Tue, 22 Jun 2010 15:50:18 -0700
> >> > Subject: User/Items Reco Engine clustering
> >> > From: jaysellers@gmail.com
> >> > To: dev@mahout.apache.org
> >> >
> >> > I'm looking to enhance a product recommendation engine. It currently
> >> works
> >> > with all data as a whole. I want to introduce clustering/grouping. Its
> >> > model based and the relationship is the common User-Items relationship.
> >> > Originally I was thinking of using a Canopy / kmeans cluster. And then
> >> > determine which cluster a user is in and execute Item Similarity against
> >> > only that cluster of items. However I can't figure out how to build a
> >> > SequenceFile using vectors with the User/Items relationship. I don't know
> >> > which data points to feed the vector. So I scratched that idea and turned
> >> > my attention to Lucene, thinking that this is really a document issue.
> >> Where
> >> > users are documents and the items are the content. I should be able to
> >> ask
> >> > Lucene, give me documents that look like this "productId3 productId9056
> >> > productId234".
> >> >
> >> > I'm looking for any and all feedback from those experienced in the
> >> > recommendation world, specifically with the grouping of users and items.
> >> >
> >> > Thanks,
> >> > -Jay
> >>
> >> _________________________________________________________________
> >> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
> >>
> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
> >
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message