Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 39721 invoked from network); 30 Mar 2010 02:49:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Mar 2010 02:49:58 -0000 Received: (qmail 41364 invoked by uid 500); 30 Mar 2010 02:49:57 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 41211 invoked by uid 500); 30 Mar 2010 02:49:57 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 41200 invoked by uid 99); 30 Mar 2010 02:49:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Mar 2010 02:49:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of js1987.smith@gmail.com designates 209.85.218.217 as permitted sender) Received: from [209.85.218.217] (HELO mail-bw0-f217.google.com) (209.85.218.217) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Mar 2010 02:49:49 +0000 Received: by bwz9 with SMTP id 9so818530bwz.29 for ; Mon, 29 Mar 2010 19:49:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=M9ypKIMI7q7yo0ZXVAZ6TiNufxlZj0IK1/lt9ZMn+bc=; b=JVdEvSyQLdWb3Kn5vS1N0ONVMawXdqUfvXLX1LZuNQxkHzansS1kKkkY7hWMEb4DaC ZTfHhlcQbfAWiDe17rvwCUKuS9vCBN1G03wBgaMeWE8bAEVNuoC1sytqGYJ8EEiEsdpE R2TQanhAey7IOer1aXMfD2ZZbq7cfL3ABsL6g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=IepQINZr2EwoNL1OBKrB7ePlNz7W+0o7jsnfsqlJ97k31wU2t2n1/EqHVte/HcvEMg 9pV9Y7SVp2N4UPj5o/QXppWoqSDEVJVCJAGW5XxWmlWhspzsZ88U5B8qzxveIo4vDj23 QprWboqyyeru6/NIDxJCtX0fK2fcR14442MIY= MIME-Version: 1.0 Received: by 10.204.102.8 with HTTP; Mon, 29 Mar 2010 19:49:26 -0700 (PDT) In-Reply-To: <31a243e71003291353u3c777b96l4d555571f901a7c9@mail.gmail.com> References: <68f8dc361003291145w2b020c50j43d4ff36ffe8f636@mail.gmail.com> <31a243e71003291353u3c777b96l4d555571f901a7c9@mail.gmail.com> Date: Tue, 30 Mar 2010 08:19:26 +0530 Received: by 10.204.161.197 with SMTP id s5mr4179377bkx.90.1269917366386; Mon, 29 Mar 2010 19:49:26 -0700 (PDT) Message-ID: <68f8dc361003291949g39afca10sd20ea9644e0901db@mail.gmail.com> Subject: Re: Region assignment in Hbase From: john smith To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00032555879a8481a30482fbad8e X-Virus-Checked: Checked by ClamAV on apache.org --00032555879a8481a30482fbad8e Content-Type: text/plain; charset=ISO-8859-1 J-D thanks for your reply. I have some doubts which I posted inline . Kindly help me On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans wrote: > Inline. > > J-D > > On Mon, Mar 29, 2010 at 11:45 AM, john smith > wrote: > > Hi all, > > > > I read the issue HBase-57 ( > https://issues.apache.org/jira/browse/HBASE-57 ) > > . I don't really understand the use of assigning regions keeping DFS in > > mind. Can anyone give an example usecase showing its advantages > > A region is composed of files, files are composed of blocks. To read > data, you need to fetch those blocks. In HDFS you normally have access > to 3 replicas and you fetch one of them over the network. If one of > the replica is on the local datanode, you don't need to go through the > network. This means less network traffic and better response time. > Is this the scenario that occurs for catering the read requests? In the thread "Data distribution in HBase" , one of the people mentioned that the data hosted by the Region Server may not actually reside on the same machine . So when asked for data , it fetches from the system containing the data. Am I right? Why is the data hosted by a "Region Server" doesn't lie on the same machine . Doesn't the name name "Region Server" imply that it holds all the regions it contains? Is it due to splits or restarting the HBase ? > > > Can > > map-reduce exploit it's advantage in any way (if data is distributed in > the > > above manner) or is it just the read-write performance that gets > improved . > > MapReduce works in the exact same way, it always tries to put the > computation next to where the data is. I recommend reading the > MapReduce tutorial > http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Overview > Also the same case Applies here I guess . When a map is run on a Region Server, It's data may not actually lie on the same machine . So it fetches from the machine containing it. This reduces the data locality ! > > > Can some one please help me in understanding this. > > > > Regards > > JS > > > --00032555879a8481a30482fbad8e--