hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From IGZ Nick <igznic...@gmail.com>
Subject Re: How does scan work internally? Does it make use of multi-threading/replication?
Date Tue, 19 Jun 2012 01:53:10 GMT
Thanks J-D!

On Tue, Jun 19, 2012 at 12:31 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> On Mon, Jun 18, 2012 at 11:49 AM, IGZ Nick <igznick01@gmail.com> wrote:
> > Okay. Let me ask a more specific example. Say I have 3 contiguous
> regions,
> > all server by one RS. So if I do a scan which gets data from each of the
> > regions, then everything has to come through this RS, which would be
> slow.
>
> Why would it be slow? Because you have to scan sequentially? You have
> different options here depending on your use case, but mainly if you
> need to go faster you can do multiple scans in parallel. That's how it
> works when MR'ing a table.
>
> > Or is there any optimization such that continuous regions don't end up
> > being server by the same regionserver?
>
> No, AFAIK there's no reason to do it.
>
> J-D
>
> >
> > On Tue, Jun 19, 2012 at 12:11 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> On Mon, Jun 18, 2012 at 11:34 AM, IGZ Nick <igznick01@gmail.com> wrote:
> >> > Hi Jean,
> >> >
> >> > Thank you for your reply. So RS is a completely different entity when
> >> > compared to the datanode?
> >>
> >> Totally.
> >>
> >> > How does RS server the data?
> >>
> >> That's HBase 101, I recommend you read the guide
> >> http://hbase.apache.org/book/book.html or the book
> >> http://ofps.oreilly.com/titles/9781449396107/ or the bigtable paper.
> >>
> >> > I can view the
> >> > region directories in HDFS. So the same region must be on 3 datanodes,
> >> > right?
> >>
> >> Yep.
> >>
> >> > Then which regionserver gets to serve that region?
> >>
> >> HBase 101, but in short the master decides that.
> >>
> >> > Is it a
> >> > completely random regionserver?
> >>
> >> The master uses a few heuristics.
> >>
> >> > And if I ask that region server for all
> >> > keys from that region, will it have to come from the same HDFS
> datanode?
> >>
> >> Depends if the data is there, if it is then it will be served locally
> >> else it will be fetched. It doesn't really matter to the region server
> >> since the HDFS client handles it transparently.
> >>
> >> > As
> >> > far as I understand, in HDFS, if I stream a file, then I get the data
> >> from
> >> > a single datanode (the one closest to the client, usually). So, in
> >> HBase, I
> >> > ask for all keys in region reg1, then I get all the keys from the
> >> datanode
> >> > that is closest to the client?
> >>
> >> Yep
> >>
> >> J-D
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message