Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D39E10FDB for ; Mon, 12 Aug 2013 21:33:57 +0000 (UTC) Received: (qmail 12028 invoked by uid 500); 12 Aug 2013 21:33:55 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 11984 invoked by uid 500); 12 Aug 2013 21:33:55 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 11974 invoked by uid 99); 12 Aug 2013 21:33:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Aug 2013 21:33:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Aug 2013 21:33:49 +0000 Received: by mail-ie0-f180.google.com with SMTP id aq17so8749670iec.25 for ; Mon, 12 Aug 2013 14:33:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=4yZAscgqNRv4XEtwfP8SqZRXRWRtevdlyYGzuQTCUlQ=; b=UINJrk2bCDFf3evo4gItjfAm2jEsHYWkS0gOFhM5QSq2EDkfo2lksFwGGmLOjPc3r3 oihfvpYx+Xm+D6dwpgC+CnFWscwu3GaeIC8RSmHOdr2Lxr5LSA2f7WvfNI9tMUMGW7dG S0h9C/b20SjgRGeEpXHlA8ij13a1vIrMecRhbffePpIRNxferw9JJBas/643e4kPhU9x sKQcx4MSAjabCCiNXPiW3SyYX3/fvPYxgsHYWqaFfMQxPady2eRqOmO5WhrnkEkubCaw 91tGRgqkD0sXjgVKPy1ejjKrT7X88vIpv3RkEDAV6xHcMTlV9dKYg687I1pA/Wonf+YT LuPA== X-Received: by 10.50.97.102 with SMTP id dz6mr669871igb.5.1376343208329; Mon, 12 Aug 2013 14:33:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.154.8 with HTTP; Mon, 12 Aug 2013 14:32:58 -0700 (PDT) In-Reply-To: References: <4F286DFF-523E-4E6E-B2A5-962DCD0566AE@gmail.com> <7110A92E-30D3-4AD9-8C77-9E3424661C26@occamsmachete.com> <9785EC64-638C-4A86-8EA4-19D037127F02@gmail.com> <3B4FB088-2A71-4301-A0CF-EA7DCACB208C@gmail.com> <602CF353-E0DB-43F4-B1E8-2DB47F852B8A@occamsmachete.com> <74EB21BE-AD0E-4F4E-A4A8-319ED3532E0A@gmail.com> <2223DCC8-5EDC-4D92-AFF2-48D00158BA71@gmail.com> <27B56259-B7ED-40D4-8EFC-03F412B68A1A@gmail.com> <2634E8B3-46B3-48C7-A5EE-06B37E6AFF4E@occamsmachete.com> <337BB966-F42C-4BEF-A974-201883210E11@gmail.com> From: Ted Dunning Date: Mon, 12 Aug 2013 14:32:58 -0700 Message-ID: Subject: Re: Setting up a recommender To: "user@mahout.apache.org" Content-Type: multipart/alternative; boundary=047d7b10ca11058d3b04e3c6e0bc X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10ca11058d3b04e3c6e0bc Content-Type: text/plain; charset=UTF-8 Yes. That would be interesting. On Mon, Aug 12, 2013 at 1:25 PM, Gokhan Capan wrote: > A little digression: Might a Matrix implementation backed by a Solr index > and uses SolrJ for querying help at all for the Solr recommendation > approach? > > It supports multiple fields of String, Text, or boolean flags. > > Best > Gokhan > > > On Wed, Aug 7, 2013 at 9:42 PM, Pat Ferrel wrote: > > > Also a question about user history. > > > > I was planning to write these into separate directories so Solr could > > fetch them from different sources but it occurs to me that it would be > > better to join A and B by user ID and output a doc per user ID with three > > fields, id, A item history, and B item history. Other fields could be > added > > for users metadata. > > > > Sound correct? This is what I'll do unless someone stops me. > > > > On Aug 7, 2013, at 11:25 AM, Pat Ferrel wrote: > > > > Once you have a sample or example of what you think the > > "log file" version will look like, can you post it? It would be great to > > have example lines for two actions with or without the same item IDs. > I'll > > make sure we can digest it. > > > > I thought more about the ingest part and I don't think the one-item-space > > is actually a problem. It just means one item dictionary. A and B will > have > > the right content, all I have to do is make sure the right ranks are > input > > to the MM, > > Transpose, and RSJ. This in turn is only one extra count of the # of > items > > in A's item space. This should be a very easy change If my thinking is > > correct. > > > > > > On Aug 7, 2013, at 8:09 AM, Ted Dunning wrote: > > > > On Tue, Aug 6, 2013 at 7:57 AM, Pat Ferrel wrote: > > > > > 4) To add more metadata to the Solr output will be left to the consumer > > > for now. If there is a good data set to use we can illustrate how to do > > it > > > in the project. Ted may have some data for this from musicbrainz. > > > > > > I am working on this issue now. > > > > The current state is that I can bring in a bunch of track names and links > > to artist names and so on. This would provide the basic set of items > > (artists, genres, tracks and tags). > > > > There is a hitch in bringing in the data needed to generate the logs > since > > that part of MB is not Apache compatible. I am working on that issue. > > > > Technically, the data is in a massively normalized relational form right > > now, but it isn't terribly hard to denormalize into a form that we need. > > > > > > > --047d7b10ca11058d3b04e3c6e0bc--