Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 77039 invoked from network); 13 Apr 2011 07:44:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Apr 2011 07:44:47 -0000 Received: (qmail 56019 invoked by uid 500); 13 Apr 2011 07:44:46 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 55903 invoked by uid 500); 13 Apr 2011 07:44:46 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 55894 invoked by uid 99); 13 Apr 2011 07:44:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2011 07:44:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of oeddyo@gmail.com designates 209.85.160.42 as permitted sender) Received: from [209.85.160.42] (HELO mail-pw0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2011 07:44:39 +0000 Received: by pwj3 with SMTP id 3so314756pwj.1 for ; Wed, 13 Apr 2011 00:44:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=4T7iu/Ew70Alm0tAvOuKHF2U+fhI6ZmSXDJZk6A41GI=; b=A8X4g/SpGu6dwdZgHb2COsGepMSfraoQj3BDrQuiDQavW1O6I4aPG6aPXPBLSgnRiT Ovz2ymIzN8D8qokCe8ipN6SpbbXGD+NjbAQho25ccKcRu5og9xSXKK3Zu4QDp789+Wn6 D8OKdnVyvHhDo+clvTL/MaqcPKtfFQTc4xHYI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=K4IZtdmOUbeqF8R769qwuHL+YNyUExHEssqaztXm8/yOcVkzZYtsFj9Cur+T9vW5yf lytWkQu41WLLRf9zv39pyILGvQeYDbeYDPunhscmsRRgRcGK33nNcGw5DL4vlJjMYZGM y5JdiiAlv7eMgVW2vfJirP+84kDhLGLxHFejg= Received: by 10.142.171.17 with SMTP id t17mr2398785wfe.209.1302680658072; Wed, 13 Apr 2011 00:44:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.60.161 with HTTP; Wed, 13 Apr 2011 00:43:58 -0700 (PDT) In-Reply-To: References: From: ke xie Date: Wed, 13 Apr 2011 15:43:58 +0800 Message-ID: Subject: Re: How about a LSH recommender ? To: Ted Dunning Cc: user@mahout.apache.org Content-Type: multipart/alternative; boundary=000e0cd1876ae155e304a0c7f961 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd1876ae155e304a0c7f961 Content-Type: text/plain; charset=ISO-8859-1 Ok, I would try to implement a none-distributed one. Actually I have a python version now. But I have a problem. When doing min-hash, the matrix should be either 1 or 0, and then do the hash functions. Then how about rating data? If the matrix is filled with 1~5 numbers, should we convert them use some treshould and convert the rating to 1 if the rating is more than the treshould? This is the reference I read about LSH. check it out (chapter 3) http://infolab.stanford.edu/~ullman/mmds.html On Wed, Apr 13, 2011 at 3:25 PM, Ted Dunning wrote: > Sure. > > LSH is a fine candidate for parallelism and scaling. > > I would recommend starting small and testing as you go rather than leaping > into a parallelized full-fledged implementation. Look for other open-source > implementaions of LSH algorithms. > > Be warned that the parameter selection for LSH can be pretty tricky (so I > hear, anyway). You should pick a reasonable and realistic test problem so > that you can experiment with that. > > > On Wed, Apr 13, 2011 at 12:19 AM, ke xie wrote: > >> Can we implement one and contribute into the mahout project? Any >> suggestions? >> > > -- Name: Ke Xie Eddy Research Group of Information Retrieval State Key Laboratory of Intelligent Technology and Systems Tsinghua University --000e0cd1876ae155e304a0c7f961--