Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 21930 invoked from network); 19 Jul 2010 17:32:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Jul 2010 17:32:47 -0000 Received: (qmail 54475 invoked by uid 500); 19 Jul 2010 17:32:46 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 54417 invoked by uid 500); 19 Jul 2010 17:32:45 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 54409 invoked by uid 99); 19 Jul 2010 17:32:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jul 2010 17:32:45 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.161.42 as permitted sender) Received: from [209.85.161.42] (HELO mail-fx0-f42.google.com) (209.85.161.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jul 2010 17:32:38 +0000 Received: by fxm17 with SMTP id 17so2878860fxm.1 for ; Mon, 19 Jul 2010 10:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=99lqQ+9CbeMjYmIp3CwY3FfD+zg2857vm3/s9EN7dFc=; b=ZIEEsIhx/VAyjf5/RD+tfPbNETyeMMjzH7Q3Gdc+X1xQxZH0/qUQRKNcqh+AXL86kO cjiohc8a7nfNP8D+0wtVfqJFszW0EA0mKoqP2wJuTysjhZBjQfB08/unP1gM/2Kz5mq7 BwSS5B6o/PmVr9u4Ipz/pFG7L/tuONUFsUt7c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=GcGUa8GYpQeNOrkSnHrsJ9T9N7qj5pZm/DmVjqaQgu68QefMTQaMBDE8rCpdklYoM4 WFAzT4RrnQgJEDhxnbq1WavhZGu/v+esZ33ghz5oze06i7pVnCuHaQmSTqTSnGTEhZVv t+DPljK3LPs9e0BVyG2x0rNHvfpIWOC0UaMgk= MIME-Version: 1.0 Received: by 10.239.134.80 with SMTP id 16mr393593hby.42.1279560738374; Mon, 19 Jul 2010 10:32:18 -0700 (PDT) Received: by 10.239.135.144 with HTTP; Mon, 19 Jul 2010 10:32:18 -0700 (PDT) In-Reply-To: <1279547759.2459.66.camel@localhost.localdomain> References: <1279285261.2445.30.camel@localhost.localdomain> <1279297567.2445.84.camel@localhost.localdomain> <1279528052.2459.45.camel@localhost.localdomain> <4C440F1B.1030207@googlemail.com> <1279547759.2459.66.camel@localhost.localdomain> Date: Mon, 19 Jul 2010 18:32:18 +0100 Message-ID: Subject: Re: Cooccurrence to align different categorization systems (many to many occurrence) From: Sean Owen To: user@mahout.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Yeah that's fine. You could do this too. You're not actually making recommendations, just computing most similar items instead of most similar users, so lots of stuff works here. On Mon, Jul 19, 2010 at 2:55 PM, Chantal Ackermann wrote: > Hi, > > mainly for the records: > > I've now mapped my items onto what in Mahout is called "User", and > mapped the categories onto Mahout "Items", instead of mapping my items > onto "Item" and the categories onto "User". > > I changed the plan because that way, it was easier to create the > GenericBooleanPrefDataModel from my input. I actually think that it fits > better that way - what's your opinion? > > The input to the data model looks a bit like this (I've shortened it for > the sake of readability): > [id=3D15901,title=3DInfamous] CAT1=3D{3=3DDrama} > [id=3D15888,title=3DMillions] CAT1=3D{3=3DDrama, 4=3DCrime, 8=3DThriller} > [id=3D16421,title=3DThe Departed] CAT1=3D{3=3DDrama, 8=3DThriller} > > NOTE that the data from the second category system is MISSING! > (I have not yet all data accumulated, but while waiting for it I am > preparing the code to process the similarities.) It would come as an > additional list per item: > CAT2=3D{=3D ...} > Where id is in a distinctly different range from the ids used for CAT1. > > I am using the code from Grant Ingersoll's article: > > // prefs is: > // FastByIDMap with id:=3DitemId, > // FastIDSet :=3D list of CAT1 (and CAT2) ids > DataModel dataModel =3D new GenericBooleanPrefDataModel(prefs); > ItemSimilarity itemSimilarity =3D new LogLikelihoodSimilarity(dataModel); > ItemBasedRecommender recommender =3D > =C2=A0 =C2=A0 =C2=A0 =C2=A0new GenericItemBasedRecommender(dataModel, ite= mSimilarity); > //Get the recommendations for the Item > // loop over all items > for (items in CAT1) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0List simItems =3D > =C2=A0 =C2=A0 =C2=A0 =C2=A0recommender.mostSimilarItems(id, numRecs); > =C2=A0 =C2=A0 =C2=A0 =C2=A0// filter out CAT1, keep only CAT2 > } > > I've run the code but as CAT2 is missing, currently, I am not filtering > the results. It seems fine, from what I can tell. > > Thanks again for your help! > Chantal > >