Return-Path: Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: (qmail 34840 invoked from network); 1 Jul 2010 18:11:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Jul 2010 18:11:43 -0000 Received: (qmail 72803 invoked by uid 500); 1 Jul 2010 18:11:43 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 72746 invoked by uid 500); 1 Jul 2010 18:11:42 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 72738 invoked by uid 500); 1 Jul 2010 18:11:42 -0000 Delivered-To: apmail-lucene-mahout-dev@lucene.apache.org Received: (qmail 72735 invoked by uid 99); 1 Jul 2010 18:11:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jul 2010 18:11:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jul 2010 18:11:35 +0000 Received: by qwd7 with SMTP id 7so1207100qwd.35 for ; Thu, 01 Jul 2010 11:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:cc:content-type; bh=A+NMuuCFhCJI7afOXUTdatShOyqY6J9L2ZS/+5urLzU=; b=rNOhTNtWUh05EDV5uVBGp/RZ6IpakSmomfz80Ms0RZq/ray3IXb1boyTtUZ+p905GD 0+wzGc+apdgPSKBsPWQAFlD5oT8aKQx7VcuU7lqSrOK6uZvvJpR59DiDqBDM1I8R/6uP sscal0BJnhkVPnQ0j6Xf6DNWyAWqoG+YMu2i4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; b=RhspzuzgWQ4g5Xv8COtB1EuwNwNJAHSL3VPIjNwlHxyfZO2qGjfxPzb+dmHK4ANDUZ qZaDyKVGoLajFp0cPElDxP58L0G+hZk+NqAQP9c7ttXMn6Ix0ZrTIgubX3Vx9I+iCwE7 c+KFtwFDKM1m0y0LcF8BOqdVehVaCKXgJjG8Y= Received: by 10.224.72.143 with SMTP id m15mr7942698qaj.231.1278007874354; Thu, 01 Jul 2010 11:11:14 -0700 (PDT) MIME-Version: 1.0 Sender: ted.dunning@gmail.com Received: by 10.224.45.80 with HTTP; Thu, 1 Jul 2010 11:10:54 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Thu, 1 Jul 2010 11:10:54 -0700 X-Google-Sender-Auth: K05rDiWxGGSlA1FOKGwNkQJPZ7Y Message-ID: Subject: Re: set-similarity in mahout To: Chen Li Cc: Rares Vernica , mahout-dev Content-Type: multipart/alternative; boundary=00c09f88cfaf5f23cb048a57653f X-Virus-Checked: Checked by ClamAV on apache.org --00c09f88cfaf5f23cb048a57653f Content-Type: text/plain; charset=UTF-8 Rares, Chenli, I don't have the specific classes at hand but here are some pointers to related items ... - the frequent item-set stuff is related ( http://tdunning.blogspot.com/2010/04/hadoop-user-group-aka-mahout-users.htmland https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html ) - the recommendation system has a cooccurrence counter (recently merged with similar code: http://mail-archives.apache.org/mod_mbox/lucene-mahout-dev/201002.mbox/%3C262964957.461161266929907859.JavaMail.jira@brutus.apache.org%3E ) - the large scale SVD code includes an efficient A' A multiplier (some of the discussion is here: https://issues.apache.org/jira/browse/MAHOUT-180 but this is very old and only useful for beginnings of pointers) Other Mahouts, This question came out of my slightly less than gracious questioning to Chen Li and Rares after their talk at the Hadoop Summit. This is their very gracious followup which I have taken the liberty of forwarding to the list to see if anybody can quickly amplify the comments above. Does anybody have more specific pointers? On Thu, Jul 1, 2010 at 10:12 AM, Chen Li wrote: > Ted, > > I want to add my thanks to you for your questions and interests in our > work. We will appreciate it if you can provide us information about > the related module in Mahout. > > Chen > > On Thu, Jul 1, 2010 at 9:35 AM, Rares Vernica wrote: > > Hello Ted, > > > > It was very nice meeting you at the Hadoop Summit. Thanks for your > > feedback on our set-similarity join work. To follow up, could you point > > us to the algorithm/module that does the equivalent of set-similarity > > join in the mahout project? > > > > Thank you, > > Rares Vernica > > UC Irvine > > > > > --00c09f88cfaf5f23cb048a57653f--