hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: How does scan work internally? Does it make use of multi-threading/replication?
Date Mon, 18 Jun 2012 18:23:28 GMT
A region is only served by 1 region server, and since HBase uses the
HDFS client it doesn't have a view of the blocks layout. HBase
currently doesn't even know about replication, it asks to read a file
and gets some data coming from somewhere (that somewhere is determined
by HDFS).

Hope this helps,


On Mon, Jun 18, 2012 at 11:16 AM, IGZ Nick <igznick01@gmail.com> wrote:
> Hi folks,
> Here is how I understand the scan flow (A regular sequential scan from key
> A to key B):
> - Zookeeper is contacted for the RegionServer that has the -ROOT- regions.
> - The -ROOT- RS is contacted and it gets you the RS for .META.
> - The .META. is contacted, and it will give you all regions for keys from A
> to B - e.g, A to A1 resides in reg1, A1 to A2 in reg2, A2 to B in reg3.
> Now if HDFS replication is set to 3, there must be 3 RS which will have
> reg1, and likewise for reg2 and reg3. So how does the client figure out
> which RS to go to? Or am I completely wrong here?
> As a follow up, if reg3 is present in RS1, RS2 and RS3, then does the
> client get all the data from A1 to A2 from a single RS or is there some
> sort of splitting like A1 to A11 can come from RS1, A11 to A12 from RS2 and
> A12 to A2  from RS3. That would be faster, right? Put another way, if my
> scan consists of only one region, which is hosted on three RegionServers,
> does the data come in from all 3 RS's or just one of them?
> Thanks a lot,
> Nick

View raw message