Return-Path: Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: (qmail 4451 invoked from network); 1 Mar 2010 16:49:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Mar 2010 16:49:06 -0000 Received: (qmail 92390 invoked by uid 500); 1 Mar 2010 16:49:04 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 92292 invoked by uid 500); 1 Mar 2010 16:49:04 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 92283 invoked by uid 99); 1 Mar 2010 16:49:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Mar 2010 16:49:04 +0000 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=HTML_MESSAGE,SPF_PASS,SUBJECT_FUZZY_TION X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ketan.dixit@gmail.com designates 209.85.218.213 as permitted sender) Received: from [209.85.218.213] (HELO mail-bw0-f213.google.com) (209.85.218.213) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Mar 2010 16:48:57 +0000 Received: by bwz5 with SMTP id 5so1747015bwz.12 for ; Mon, 01 Mar 2010 08:48:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=yC433hp9FXYEf0LBvfbQ28+WKv9poknCmL+KBWPdpCg=; b=TM6OJIJeoQcnJQaGsCqVMVHDnVTt7ju0JKhgP5M57lvoNUb+S/7pRocTWa3evDjsaL lBsMdKlnVM472A5TptbW2/d2hLF7pgTP9pacHUZ35bHGBY7ZOc6MEI4SzBiEvMgmBDO/ J8gpOVT4mnv0+xziAXPjAAfpdDhXZwd688lec= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=gtx2LEkCLOv3JBtgUIhifY+iWOeb9r09+J7H7gD8AXMDXO62MAE510NceghSKu8TZr I7ymFPWvJVq3v2nzf0yinFS07AQUuMcx3QKNckXLInInWt9v/uNaoRryA1Mm84FbV9Ja JMSz1rkiIjhObcIVY6pdODkKm3ZIsgFzPTCt8= MIME-Version: 1.0 Received: by 10.204.131.208 with SMTP id y16mr3082627bks.137.1267462116110; Mon, 01 Mar 2010 08:48:36 -0800 (PST) Date: Mon, 1 Mar 2010 11:48:36 -0500 Message-ID: <68432d881003010848m754a03c8vd279e7c9a90890c5@mail.gmail.com> Subject: Namespace partitioning using Locality Sensitive Hashing From: Ketan Dixit To: common-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=001517448ad83271050480c0052f --001517448ad83271050480c0052f Content-Type: text/plain; charset=ISO-8859-1 Hi, I am a graduate student in Computer Science department at SUNY Stony Brook. I am thinking of doing a project on Hadoop for my course "Cloud Computing" conducted by Prof. Radu Sion. While going through the links of the "Yahoo open source projects for students" page I found the idea "Research on new hashing schemes for filesystem namespace partitioning" interesting. It looks to me the idea is to assign subtree of the whole namespace to one namenode and another subtree to another namenode. How LSH is better than normal hashing? Because still, a client or a fixed namenode has to take decision of which namenode to contact in whatever hashing ? It looks to me that requests to files under same subtree are directed to the same namenode then the performance will be faster as the requests to the same namenode are clustered around the a part of namespace subtree (For example a part of on which client is doing some operation.) Is this assumption correct? Can I have more insight in this regard. Thanks, Ketan --001517448ad83271050480c0052f--