From mahout-user-return-916-apmail-lucene-mahout-user-archive=lucene.apache.org@lucene.apache.org Fri Jul 10 22:50:29 2009 Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 17657 invoked from network); 10 Jul 2009 22:50:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Jul 2009 22:50:29 -0000 Received: (qmail 3670 invoked by uid 500); 10 Jul 2009 22:50:38 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 3614 invoked by uid 500); 10 Jul 2009 22:50:38 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 3604 invoked by uid 99); 10 Jul 2009 22:50:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jul 2009 22:50:38 +0000 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.221.203 as permitted sender) Received: from [209.85.221.203] (HELO mail-qy0-f203.google.com) (209.85.221.203) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jul 2009 22:50:30 +0000 Received: by qyk41 with SMTP id 41so1208520qyk.29 for ; Fri, 10 Jul 2009 15:50:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=zlsDEv6Ee5LgoWOw916Zi+xkPvggm/7LXgljvOfKQzs=; b=W6g2QQSbEfjipIKFHDTKKQV+TO6gcIfrpJz7Ei3h8YmkoliFVfYwzSHSWMixcSpfS3 NEKdZD65HFe4Otq7cGa7PRdpDSvHK4r9lgZESuW5NCA9yt88+zE1yzk2zvIJHZybvo/q HGLS0hOW8oGZzs5N7s+ws+lfBXuiVn1S+JYnw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=cV2PURwouoQKaHIvFEc7vJirb7dh7rOBoMNx3gyTshq+zLn7VWSFISSXGeGkIFn1NT GoHO7LRGJjuLQNSXMhpvUMIyG4z3aY5fU2iF/b1ceh5a7hqiDZ4QYdBvPB2A1yfGrw6A fTWazYsco18azJm4OOwBlLxIWzn72EKm5eBH4= MIME-Version: 1.0 Received: by 10.224.45.72 with SMTP id d8mr1633918qaf.124.1247266209097; Fri, 10 Jul 2009 15:50:09 -0700 (PDT) In-Reply-To: References: <4A5703F8.8050603@mufin.com> <4A57319F.5090008@mufin.com> <85d3c3b60907101248p49b36dcau5cb54fec13fa1a1f@mail.gmail.com> Date: Fri, 10 Jul 2009 15:50:09 -0700 Message-ID: <85d3c3b60907101550v4f843090ka9f69f1143ab4897@mail.gmail.com> Subject: Re: Memory and Speed Questions for Item-Based-Recommender From: Jason Rutherglen To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0015175cba1e5568bd046e61cb02 X-Virus-Checked: Checked by ClamAV on apache.org --0015175cba1e5568bd046e61cb02 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Interesting. So we're creating the item-item matrix using one of the Mahout algorithms (like Taste?), then dumping it into Lucene. I don't have any experience with the item-item matrix part so working on an example will help me understand it better. Showing the Lucene part may help others who work along these lines. On Fri, Jul 10, 2009 at 12:57 PM, Ted Dunning wrote: > Don't think so. Sean should comment definitively. > > It is actually very easy to do. The output of the recommendation off-line > process (in my case, statistical filtering of the coocurrence matrix, in > other cases something different) is generally a sparse matrix of item-item > links. Each line of this sparse matrix can be considered a document in > creating a Lucene index. You will have to use a correct analyzer and a > line > by line document segmenter, but that is trivial. > > Then recommendation is a simple query step. > > You guys at Linked-in have nice ability to present Lucene results in > real-time so the part after gettting the item-item matrix should be dead > simple for you. > > On Fri, Jul 10, 2009 at 12:48 PM, Jason Rutherglen < > jason.rutherglen@gmail.com> wrote: > > > Is there an example of this (using Lucene to store item-item links in > > Lucene) in Mahout? Sounds interesting. > > > > On Fri, Jul 10, 2009 at 11:35 AM, Ted Dunning > > wrote: > > > > > Storing the item-item links in Lucene and forming a query with recent > > > history is a pretty easy way to get real-time recommendations. This > can > > > also get rid of the cache because standard measures applied to make > > Lucene > > > fast will work on this. > > > > > > --0015175cba1e5568bd046e61cb02--