Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of js1987.smith@gmail.com
 designates 209.85.218.217 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=IepQINZr2EwoNL1OBKrB7ePlNz7W+0o7jsnfsqlJ97k31wU2t2n1/EqHVte/HcvEMg
         9pV9Y7SVp2N4UPj5o/QXppWoqSDEVJVCJAGW5XxWmlWhspzsZ88U5B8qzxveIo4vDj23
         QprWboqyyeru6/NIDxJCtX0fK2fcR14442MIY=
MIME-Version: 1.0
In-Reply-To: <31a243e71003291353u3c777b96l4d555571f901a7c9@mail.gmail.com>
References: <68f8dc361003291145w2b020c50j43d4ff36ffe8f636@mail.gmail.com>
	 <31a243e71003291353u3c777b96l4d555571f901a7c9@mail.gmail.com>
Date: Tue, 30 Mar 2010 08:19:26 +0530
Message-ID: <68f8dc361003291949g39afca10sd20ea9644e0901db@mail.gmail.com>
Subject: Re: Region assignment in Hbase
From: john smith <js1987.smith@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00032555879a8481a30482fbad8e

--00032555879a8481a30482fbad8e
Content-Type: text/plain; charset=ISO-8859-1

J-D thanks for your reply. I have some doubts which I posted inline . Kindly
help me

On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Inline.
>
> J-D
>
> On Mon, Mar 29, 2010 at 11:45 AM, john smith <js1987.smith@gmail.com>
> wrote:
> > Hi all,
> >
> > I read the issue HBase-57 (
> https://issues.apache.org/jira/browse/HBASE-57 )
> > . I don't really understand the use of assigning regions keeping DFS in
> > mind. Can anyone give an example usecase showing its advantages
>
> A region is composed of files, files are composed of blocks. To read
> data, you need to fetch those blocks. In HDFS you normally have access
> to 3 replicas and you fetch one of them over the network. If one of
> the replica is on the local datanode, you don't need to go through the
> network. This means less network traffic and better response time.
>

Is this the scenario that occurs for catering the read requests?  In the
thread "Data distribution in HBase" , one of the people mentioned that the
data hosted by the Region Server may not actually reside on the same machine
. So when asked for data , it fetches from the system containing the data.
Am I right?  Why is the data hosted by a "Region Server" doesn't lie on the
same machine . Doesn't the name name "Region Server" imply that it holds all
the regions it contains? Is it due to splits or restarting the HBase ?


>
> > Can
> > map-reduce exploit it's advantage in any way (if data is distributed in
> the
> > above manner)  or is it just the read-write performance that gets
> improved .
>
> MapReduce works in the exact same way, it always tries to put the
> computation next to where the data is. I recommend reading the
> MapReduce tutorial
> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Overview
>

Also the same case Applies here I guess . When a map is run on a Region
Server, It's data may not actually lie on the same machine . So it fetches
from the machine containing it. This reduces the data locality !


>
> > Can some one please help me in understanding this.
> >
> > Regards
> > JS
> >
>

--00032555879a8481a30482fbad8e--