mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Understanding Mahout Algos and Applications
Date Thu, 04 Aug 2011 11:32:37 GMT
(Josh and I had spoken separately.)

I think he's interested in perhaps learning those similarities, indeed.
As a rough-and-ready start, I'd suggested pure collaborative filtering based
on user and item associations only. Later, you can work in user-user
similarity, learned elsewhere, to improve things.

2011/8/4 Christopher Jordan <cjordan@gilt.com>

> I actually disagree with that statement. While Mahout is built on Hadoop,
> distributed computing is not a factor in whether or not you can model your
> data.
>
> Josh, it sounds like you already know a fair bit about your users. In that
> case, why not leverage your demographic data to group them yourself using
> domain knowledge that makes sense. For example, using the zip and age to
> group them regionally and by age group. Then you can try to build a
> recommendation engine for each group of users. If you don't know a lot about
> your users to make those kinds of groups, it sounds like you might need to
> do some exploratory statistics on them.
>
> On Aug 4, 2011, at 2:21 AM, 戴清灏 wrote:
>
> > Hi,
> >    I think the core issue is not how this engine work, but whether mahout
> > fits your data size.
> >    Mahout is built on hadoop, which digest big data.
> >    If your data size is not that huge or incompatible with mapreduce
> model,
> > it may not be a good idea.
> >    Regards.
> >    Roger
> >
> > 2011/8/4 Josh Dulberger <jidulberger@gmail.com>
> >
> >> Hello,
> >>
> >> I have some familiarity with machine learning (in an academic setting)
> but
> >> am looking for some assistance on which Mahout algorithms would be suit
> my
> >> needs.
> >>
> >> I am doing consumer behavior research at a web-marketing startup, where
> we
> >> generate a decent amount of data. We track behavioral data - engagement
> >> stats, view-times, feedback - and also have demographic data. We also
> have
> >> an inventory of items/sites, and some rudimentary (manual)
> categorizations.
> >>
> >> We were just approved for a data warehouse to integrate our data and I
> have
> >> approval to begin working on a consumer targeting platform. The core
> idea
> >> is
> >> to match consumers with items, testing different approaches for
> different
> >> classes of consumers and items. I expect to be looking at
> item-similarity,
> >> consumer-similarity, and hybrid models, and eventually incorporate
> global
> >> trends.
> >>
> >> Initially, I think we can start with a recommender engine, then develop
> a
> >> clustering/classifier. But I am now wanting more insight into what kinds
> of
> >> questions each is best at answering and how fit together. So far, my
> >> understanding of the difference is that recommenders accept input of
> users,
> >> positive/negative scoring, item, and timestamp, then output a
> >> recommendation
> >> (with variation depending on the specific algo).
> >>
> >> This leaves out demographic data (age, gender, zip, or even
> socioeconomic).
> >> I gather that clustering algos can incorporate this kind of data (and
> more)
> >> in order to find natural groupings. Is the natural connection point to
> find
> >> similar users and items using clustering, then feed that into a
> >> recommender?
> >> How does this feeding work? Or, if I the above is at all right-headed,
> what
> >> are some options as to how to make the connection?
> >>
> >> I appreciate in advance any answers, ideas, insights, or even questions
> any
> >> of you may have.
> >>
> >> Thanks,
> >>
> >> Josh
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message