hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From IGZ Nick <igznic...@gmail.com>
Subject Re: How does scan work internally? Does it make use of multi-threading/replication?
Date Mon, 18 Jun 2012 18:34:35 GMT
Hi Jean,

Thank you for your reply. So RS is a completely different entity when
compared to the datanode? How does RS server the data? I can view the
region directories in HDFS. So the same region must be on 3 datanodes,
right? Then which regionserver gets to serve that region? Is it a
completely random regionserver? And if I ask that region server for all
keys from that region, will it have to come from the same HDFS datanode? As
far as I understand, in HDFS, if I stream a file, then I get the data from
a single datanode (the one closest to the client, usually). So, in HBase, I
ask for all keys in region reg1, then I get all the keys from the datanode
that is closest to the client?

Thanks for your time,

On Mon, Jun 18, 2012 at 11:53 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> A region is only served by 1 region server, and since HBase uses the
> HDFS client it doesn't have a view of the blocks layout. HBase
> currently doesn't even know about replication, it asks to read a file
> and gets some data coming from somewhere (that somewhere is determined
> by HDFS).
> Hope this helps,
> J-D
> On Mon, Jun 18, 2012 at 11:16 AM, IGZ Nick <igznick01@gmail.com> wrote:
> > Hi folks,
> >
> > Here is how I understand the scan flow (A regular sequential scan from
> key
> > A to key B):
> > - Zookeeper is contacted for the RegionServer that has the -ROOT-
> regions.
> > - The -ROOT- RS is contacted and it gets you the RS for .META.
> > - The .META. is contacted, and it will give you all regions for keys
> from A
> > to B - e.g, A to A1 resides in reg1, A1 to A2 in reg2, A2 to B in reg3.
> >
> > Now if HDFS replication is set to 3, there must be 3 RS which will have
> > reg1, and likewise for reg2 and reg3. So how does the client figure out
> > which RS to go to? Or am I completely wrong here?
> > As a follow up, if reg3 is present in RS1, RS2 and RS3, then does the
> > client get all the data from A1 to A2 from a single RS or is there some
> > sort of splitting like A1 to A11 can come from RS1, A11 to A12 from RS2
> and
> > A12 to A2  from RS3. That would be faster, right? Put another way, if my
> > scan consists of only one region, which is hosted on three RegionServers,
> > does the data come in from all 3 RS's or just one of them?
> >
> > Thanks a lot,
> > Nick

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message