Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FA0F1090E for ; Fri, 28 Jun 2013 18:07:38 +0000 (UTC) Received: (qmail 6309 invoked by uid 500); 28 Jun 2013 18:07:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 6193 invoked by uid 500); 28 Jun 2013 18:07:31 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 6125 invoked by uid 99); 28 Jun 2013 18:07:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2013 18:07:30 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sxk1969@hotmail.com designates 65.54.190.207 as permitted sender) Received: from [65.54.190.207] (HELO bay0-omc4-s5.bay0.hotmail.com) (65.54.190.207) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2013 18:07:22 +0000 Received: from BAY166-W34 ([65.54.190.199]) by bay0-omc4-s5.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 28 Jun 2013 11:07:00 -0700 X-TMN: [mxYipOd3GhCGdJlShdKeQw2zgHwZlGoT] X-Originating-Email: [sxk1969@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_7b0a2b94-8fb2-49ed-b734-099208ac3901_" From: Saikat Kanjilal To: "solr-user@lucene.apache.org" CC: "java-user@lucene.apache.org" Subject: RE: Content based recommender using lucene/solr Date: Fri, 28 Jun 2013 11:07:00 -0700 Importance: Normal In-Reply-To: References: ,, MIME-Version: 1.0 X-OriginalArrivalTime: 28 Jun 2013 18:07:00.0962 (UTC) FILETIME=[496C8420:01CE742A] X-Virus-Checked: Checked by ClamAV on apache.org --_7b0a2b94-8fb2-49ed-b734-099208ac3901_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable You could build a custom recommender in mahout to accomplish this=2C also j= ust out of curiosity why the content based approach as opposed to building = a recommender based on co-occurence. One other thing=2C what is your data = size=2C are you looking at scale where you need something like hadoop? > From: lcguerrerocovo@gmail.com > Date: Fri=2C 28 Jun 2013 13:02:00 -0500 > Subject: Re: Content based recommender using lucene/solr > To: solr-user@lucene.apache.org > CC: java-user@lucene.apache.org >=20 > Hey saikat=2C thanks for your suggestion. I've looked into mahout and oth= er > alternatives for computing k nearest neighbors. I would have to run a job > and computer the k nearest neighbors and track them in the index for > retrieval. I wanted to see if this was something I could do with lucene > using lucene's scoring function and solr's morelikethis component. The jo= b > you specifically mention is for Item based recommendation which would > require me to track the different items users have viewed. I'm looking fo= r > a content based approach where I would use a distance measure to establis= h > how near items are (how similar) and have some kind of training phase to > adjust weights. >=20 >=20 > On Fri=2C Jun 28=2C 2013 at 12:42 PM=2C Saikat Kanjilal wrote: >=20 > > Why not just use mahout to do this=2C there is an item similarity algor= ithm > > in mahout that does exactly this :) > > > > > > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/= cf/taste/hadoop/similarity/item/ItemSimilarityJob.html > > > > You can use mahout in distributed and non-distributed mode as well. > > > > > From: lcguerrerocovo@gmail.com > > > Date: Fri=2C 28 Jun 2013 12:16:57 -0500 > > > Subject: Content based recommender using lucene/solr > > > To: solr-user@lucene.apache.org=3B java-user@lucene.apache.org > > > > > > Hi=2C > > > > > > I'm using lucene and solr right now in a production environment with = an > > > index of about a million docs. I'm working on a recommender that > > basically > > > would list the n most similar items to the user based on the current = item > > > he is viewing. > > > > > > I've been thinking of using solr/lucene since I already have all docs > > > available and I want a quick version that can be deployed while we wo= rk > > on > > > a more robust recommender. How about overriding the default similarit= y so > > > that it scores documents based on the euclidean distance of normalize= d > > item > > > attributes and then using a morelikethis component to pass in the > > > attributes of the item for which I want to generate recommendations? = I > > know > > > it has its issues like recomputing scores/normalization/weight > > application > > > at query time which could make this idea unfeasible/impractical. I'm = at a > > > very preliminary stage right now with this and would love some > > suggestions > > > from experienced users. > > > > > > thank you=2C > > > > > > Luis Guerrero > > > > >=20 >=20 >=20 > --=20 > Luis Carlos Guerrero Covo > M.S. Computer Engineering > (57) 3183542047 = --_7b0a2b94-8fb2-49ed-b734-099208ac3901_--