Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A5BD6653 for ; Thu, 4 Aug 2011 11:33:11 +0000 (UTC) Received: (qmail 84226 invoked by uid 500); 4 Aug 2011 11:33:09 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 84045 invoked by uid 500); 4 Aug 2011 11:33:06 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 84037 invoked by uid 99); 4 Aug 2011 11:33:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Aug 2011 11:33:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FILL_THIS_FORM_SHORT,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of srowen@gmail.com designates 74.125.83.42 as permitted sender) Received: from [74.125.83.42] (HELO mail-gw0-f42.google.com) (74.125.83.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Aug 2011 11:32:58 +0000 Received: by gwb17 with SMTP id 17so1835559gwb.1 for ; Thu, 04 Aug 2011 04:32:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=uKZkiBkMjdVfprI8xDbMZt2QCV3/zTMwyIzrxD5pWQo=; b=ruec652hvjMjahqMZJTknOV+xtHtto8vJceRfDKk+GdSteAg2UOYyGJzHMimnwmmsv LsIXCKsXWoRK4pV5xYaAJgdk1LoY2rj9qj0LiPk25m6BSaXIlMTGDGyYNd/AhH+iWC3T pepZZW2ylxcxKwS2uS/xqAq/dG6YtxSAqtOFM= MIME-Version: 1.0 Received: by 10.101.62.3 with SMTP id p3mr587696ank.29.1312457557572; Thu, 04 Aug 2011 04:32:37 -0700 (PDT) Received: by 10.100.12.11 with HTTP; Thu, 4 Aug 2011 04:32:37 -0700 (PDT) In-Reply-To: References: Date: Thu, 4 Aug 2011 12:32:37 +0100 Message-ID: Subject: Re: Understanding Mahout Algos and Applications From: Sean Owen To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=001636eee0f5807df804a9ac56f9 --001636eee0f5807df804a9ac56f9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable (Josh and I had spoken separately.) I think he's interested in perhaps learning those similarities, indeed. As a rough-and-ready start, I'd suggested pure collaborative filtering base= d on user and item associations only. Later, you can work in user-user similarity, learned elsewhere, to improve things. 2011/8/4 Christopher Jordan > I actually disagree with that statement. While Mahout is built on Hadoop, > distributed computing is not a factor in whether or not you can model you= r > data. > > Josh, it sounds like you already know a fair bit about your users. In tha= t > case, why not leverage your demographic data to group them yourself using > domain knowledge that makes sense. For example, using the zip and age to > group them regionally and by age group. Then you can try to build a > recommendation engine for each group of users. If you don't know a lot ab= out > your users to make those kinds of groups, it sounds like you might need t= o > do some exploratory statistics on them. > > On Aug 4, 2011, at 2:21 AM, =E6=88=B4=E6=B8=85=E7=81=8F wrote: > > > Hi, > > I think the core issue is not how this engine work, but whether maho= ut > > fits your data size. > > Mahout is built on hadoop, which digest big data. > > If your data size is not that huge or incompatible with mapreduce > model, > > it may not be a good idea. > > Regards. > > Roger > > > > 2011/8/4 Josh Dulberger > > > >> Hello, > >> > >> I have some familiarity with machine learning (in an academic setting) > but > >> am looking for some assistance on which Mahout algorithms would be sui= t > my > >> needs. > >> > >> I am doing consumer behavior research at a web-marketing startup, wher= e > we > >> generate a decent amount of data. We track behavioral data - engagemen= t > >> stats, view-times, feedback - and also have demographic data. We also > have > >> an inventory of items/sites, and some rudimentary (manual) > categorizations. > >> > >> We were just approved for a data warehouse to integrate our data and I > have > >> approval to begin working on a consumer targeting platform. The core > idea > >> is > >> to match consumers with items, testing different approaches for > different > >> classes of consumers and items. I expect to be looking at > item-similarity, > >> consumer-similarity, and hybrid models, and eventually incorporate > global > >> trends. > >> > >> Initially, I think we can start with a recommender engine, then develo= p > a > >> clustering/classifier. But I am now wanting more insight into what kin= ds > of > >> questions each is best at answering and how fit together. So far, my > >> understanding of the difference is that recommenders accept input of > users, > >> positive/negative scoring, item, and timestamp, then output a > >> recommendation > >> (with variation depending on the specific algo). > >> > >> This leaves out demographic data (age, gender, zip, or even > socioeconomic). > >> I gather that clustering algos can incorporate this kind of data (and > more) > >> in order to find natural groupings. Is the natural connection point to > find > >> similar users and items using clustering, then feed that into a > >> recommender? > >> How does this feeding work? Or, if I the above is at all right-headed, > what > >> are some options as to how to make the connection? > >> > >> I appreciate in advance any answers, ideas, insights, or even question= s > any > >> of you may have. > >> > >> Thanks, > >> > >> Josh > >> > > --001636eee0f5807df804a9ac56f9--