Mailing-List: contact user-help@mahout.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@mahout.apache.org
Received-SPF: pass (athena.apache.org: domain of srowen@gmail.com designates
 74.125.83.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <F57B22B2-F004-47B1-9EAA-CA199D0A7CBD@gilt.com>
References: 
 <CAO4+3i6fEa4nia3tndvOvhZxZVTSZbqhJoCjWQ1jMTZ3EGBnJw@mail.gmail.com>
	<CAFcOpXdc++zT4hoosYWmAU_TFbEMfJjjBRs7sDZmGeaNKoK2Bw@mail.gmail.com>
	<F57B22B2-F004-47B1-9EAA-CA199D0A7CBD@gilt.com>
Date: Thu, 4 Aug 2011 12:32:37 +0100
Message-ID: 
 <CAEccTyxsVXdcDwYncWT14PUacKfU55gaDOZjXiZUjnmUsqL6PQ@mail.gmail.com>
Subject: Re: Understanding Mahout Algos and Applications
From: Sean Owen <srowen@gmail.com>
To: user@mahout.apache.org
Content-Type: multipart/alternative; boundary=001636eee0f5807df804a9ac56f9

--001636eee0f5807df804a9ac56f9
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

(Josh and I had spoken separately.)

I think he's interested in perhaps learning those similarities, indeed.
As a rough-and-ready start, I'd suggested pure collaborative filtering base=
d
on user and item associations only. Later, you can work in user-user
similarity, learned elsewhere, to improve things.

2011/8/4 Christopher Jordan <cjordan@gilt.com>

> I actually disagree with that statement. While Mahout is built on Hadoop,
> distributed computing is not a factor in whether or not you can model you=
r
> data.
>
> Josh, it sounds like you already know a fair bit about your users. In tha=
t
> case, why not leverage your demographic data to group them yourself using
> domain knowledge that makes sense. For example, using the zip and age to
> group them regionally and by age group. Then you can try to build a
> recommendation engine for each group of users. If you don't know a lot ab=
out
> your users to make those kinds of groups, it sounds like you might need t=
o
> do some exploratory statistics on them.
>
> On Aug 4, 2011, at 2:21 AM, =E6=88=B4=E6=B8=85=E7=81=8F wrote:
>
> > Hi,
> >    I think the core issue is not how this engine work, but whether maho=
ut
> > fits your data size.
> >    Mahout is built on hadoop, which digest big data.
> >    If your data size is not that huge or incompatible with mapreduce
> model,
> > it may not be a good idea.
> >    Regards.
> >    Roger
> >
> > 2011/8/4 Josh Dulberger <jidulberger@gmail.com>
> >
> >> Hello,
> >>
> >> I have some familiarity with machine learning (in an academic setting)
> but
> >> am looking for some assistance on which Mahout algorithms would be sui=
t
> my
> >> needs.
> >>
> >> I am doing consumer behavior research at a web-marketing startup, wher=
e
> we
> >> generate a decent amount of data. We track behavioral data - engagemen=
t
> >> stats, view-times, feedback - and also have demographic data. We also
> have
> >> an inventory of items/sites, and some rudimentary (manual)
> categorizations.
> >>
> >> We were just approved for a data warehouse to integrate our data and I
> have
> >> approval to begin working on a consumer targeting platform. The core
> idea
> >> is
> >> to match consumers with items, testing different approaches for
> different
> >> classes of consumers and items. I expect to be looking at
> item-similarity,
> >> consumer-similarity, and hybrid models, and eventually incorporate
> global
> >> trends.
> >>
> >> Initially, I think we can start with a recommender engine, then develo=
p
> a
> >> clustering/classifier. But I am now wanting more insight into what kin=
ds
> of
> >> questions each is best at answering and how fit together. So far, my
> >> understanding of the difference is that recommenders accept input of
> users,
> >> positive/negative scoring, item, and timestamp, then output a
> >> recommendation
> >> (with variation depending on the specific algo).
> >>
> >> This leaves out demographic data (age, gender, zip, or even
> socioeconomic).
> >> I gather that clustering algos can incorporate this kind of data (and
> more)
> >> in order to find natural groupings. Is the natural connection point to
> find
> >> similar users and items using clustering, then feed that into a
> >> recommender?
> >> How does this feeding work? Or, if I the above is at all right-headed,
> what
> >> are some options as to how to make the connection?
> >>
> >> I appreciate in advance any answers, ideas, insights, or even question=
s
> any
> >> of you may have.
> >>
> >> Thanks,
> >>
> >> Josh
> >>
>
>

--001636eee0f5807df804a9ac56f9--