Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4CF979C06 for ; Tue, 19 Jun 2012 01:53:40 +0000 (UTC) Received: (qmail 44992 invoked by uid 500); 19 Jun 2012 01:53:38 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 44954 invoked by uid 500); 19 Jun 2012 01:53:38 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 44942 invoked by uid 99); 19 Jun 2012 01:53:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2012 01:53:38 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of igznick01@gmail.com designates 209.85.214.41 as permitted sender) Received: from [209.85.214.41] (HELO mail-bk0-f41.google.com) (209.85.214.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2012 01:53:31 +0000 Received: by bkcjm19 with SMTP id jm19so5965822bkc.14 for ; Mon, 18 Jun 2012 18:53:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=SL6aGv7GjWxvBVaIUaCuuKnSbQ2ZCVfwGjJegeQ7+Rs=; b=F+KoVmmOpoleGY3JW61PIqsr5+iaqNcAkDOsA0UPMDec5DZ6TxpgOpVBQNhv4iXfqG 2LncUFklm0BCE/mPYRZt4pW3cohHiYrbtTFTQB4ne8oLcCbD/aYG717u/DuF1gOKWtPr bOTEYTsaQGcCK0dbBWPAM1QP//m+S9Qoqu5j8yEZE8yBFw/eQLJSxiddNHL4zmL9CW8f m5dn3qlf6/6+zDYRGrAlgrvtUmTTOFudq7O/OaqLyG9HVEcV1zugBsKb/7Hh1yVIMZ1j slV6iJlPYz+aCko6fmNOBa+FKe5lWl6y4E8VQv3gAU4wHyJq6lFg+NuUUlEXT5oDT7mM UHYg== MIME-Version: 1.0 Received: by 10.205.133.13 with SMTP id hw13mr7220985bkc.30.1340070790475; Mon, 18 Jun 2012 18:53:10 -0700 (PDT) Received: by 10.204.184.10 with HTTP; Mon, 18 Jun 2012 18:53:10 -0700 (PDT) In-Reply-To: References: Date: Tue, 19 Jun 2012 07:23:10 +0530 Message-ID: Subject: Re: How does scan work internally? Does it make use of multi-threading/replication? From: IGZ Nick To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=000e0ce0d67270b14204c2c98bc5 --000e0ce0d67270b14204c2c98bc5 Content-Type: text/plain; charset=ISO-8859-1 Thanks J-D! On Tue, Jun 19, 2012 at 12:31 AM, Jean-Daniel Cryans wrote: > On Mon, Jun 18, 2012 at 11:49 AM, IGZ Nick wrote: > > Okay. Let me ask a more specific example. Say I have 3 contiguous > regions, > > all server by one RS. So if I do a scan which gets data from each of the > > regions, then everything has to come through this RS, which would be > slow. > > Why would it be slow? Because you have to scan sequentially? You have > different options here depending on your use case, but mainly if you > need to go faster you can do multiple scans in parallel. That's how it > works when MR'ing a table. > > > Or is there any optimization such that continuous regions don't end up > > being server by the same regionserver? > > No, AFAIK there's no reason to do it. > > J-D > > > > > On Tue, Jun 19, 2012 at 12:11 AM, Jean-Daniel Cryans < > jdcryans@apache.org>wrote: > > > >> On Mon, Jun 18, 2012 at 11:34 AM, IGZ Nick wrote: > >> > Hi Jean, > >> > > >> > Thank you for your reply. So RS is a completely different entity when > >> > compared to the datanode? > >> > >> Totally. > >> > >> > How does RS server the data? > >> > >> That's HBase 101, I recommend you read the guide > >> http://hbase.apache.org/book/book.html or the book > >> http://ofps.oreilly.com/titles/9781449396107/ or the bigtable paper. > >> > >> > I can view the > >> > region directories in HDFS. So the same region must be on 3 datanodes, > >> > right? > >> > >> Yep. > >> > >> > Then which regionserver gets to serve that region? > >> > >> HBase 101, but in short the master decides that. > >> > >> > Is it a > >> > completely random regionserver? > >> > >> The master uses a few heuristics. > >> > >> > And if I ask that region server for all > >> > keys from that region, will it have to come from the same HDFS > datanode? > >> > >> Depends if the data is there, if it is then it will be served locally > >> else it will be fetched. It doesn't really matter to the region server > >> since the HDFS client handles it transparently. > >> > >> > As > >> > far as I understand, in HDFS, if I stream a file, then I get the data > >> from > >> > a single datanode (the one closest to the client, usually). So, in > >> HBase, I > >> > ask for all keys in region reg1, then I get all the keys from the > >> datanode > >> > that is closest to the client? > >> > >> Yep > >> > >> J-D > >> > --000e0ce0d67270b14204c2c98bc5--