Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28698D846 for ; Wed, 29 Aug 2012 19:58:35 +0000 (UTC) Received: (qmail 8459 invoked by uid 500); 29 Aug 2012 19:58:30 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 8369 invoked by uid 500); 29 Aug 2012 19:58:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 8362 invoked by uid 99); 29 Aug 2012 19:58:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 19:58:30 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of psybers@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 19:58:24 +0000 Received: by iecs9 with SMTP id s9so455483iec.35 for ; Wed, 29 Aug 2012 12:58:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=WBQJrK6amx+qGsAZwrfOrwUzWFDNSrBCvDdIIVThenM=; b=pB5D5XRJUbxkrkBeQCHm3GzLgdz+FtgTMX9a07/dLksX5Tz6F+dhhiQX2SoR71Qy3e aQA7Yj+V3tKwxxavW7EJ7Bdu4DwWJPCs9EitjwJcU+MAgY4B1VOR+hmwy0eNt35zLHTj G56Jpmau30pqpBCaUXIEuXyub9JoM2FO4+ycVtfe2moSG0Y2XyMiRGMZMCNFnc2GI9U3 t23fTdqlklmZTBR5YPynALuMBrLD8zQUuVodtKMOFty8Di4hvZXdktFI6lLURAtyvAJN HGIUMewdFKJ+E03QPHiCIakmu20R0KLs6ELhb7xQkyQ7dXfLcqrmoTRLs5Dk8IoUDRTH P3gw== MIME-Version: 1.0 Received: by 10.50.237.41 with SMTP id uz9mr3072015igc.43.1346270283467; Wed, 29 Aug 2012 12:58:03 -0700 (PDT) Sender: psybers@gmail.com Received: by 10.64.5.3 with HTTP; Wed, 29 Aug 2012 12:58:03 -0700 (PDT) In-Reply-To: References: Date: Wed, 29 Aug 2012 14:58:03 -0500 X-Google-Sender-Auth: yL4Z3nBIg4Rt93W69inPJ7hvHow Message-ID: Subject: Re: HBase and MapReduce data locality From: Robert Dyer To: N Keywal Cc: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Ah thanks for that link. I missed it while browsing the docs. The link from there to this blog post http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html really answers my questions! :-) On Wed, Aug 29, 2012 at 2:38 AM, N Keywal wrote: > Inline. Just a set of "you're right" :-). > It's documented here: > http://hbase.apache.org/book.html#regions.arch.locality > > On Wed, Aug 29, 2012 at 8:06 AM, Robert Dyer wrote: >> >> Ok but does that imply that only 1 of your compute nodes is promised >> to have all of the data for any given row? The blocks will replicate, >> but they don't necessarily all replicate to the same nodes right? > > > Right. > >> >> So if I have say 2 column families (cf1, cf2) and there is 2 physical >> files on the HDFS for those (per region) then those files are created >> on one datanode (dn1) which will have all blocks local to that node. > > > Yes. Nit: datanodes don't "see" files, only blocks. But the logic remains > the same. > >> >> Once it replicates those blocks 2 more times by default, isn't it >> possible the blocks for cf1 will go to dn2, dn3 while the blocks for >> cf2 goes to dn4, dn5? > > > Yes, it's possible (and even likely).