Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D31E111C4 for ; Fri, 8 Aug 2014 02:18:11 +0000 (UTC) Received: (qmail 81831 invoked by uid 500); 8 Aug 2014 02:18:09 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 81782 invoked by uid 500); 8 Aug 2014 02:18:08 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 81770 invoked by uid 99); 8 Aug 2014 02:18:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2014 02:18:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tobeg3oogle@gmail.com designates 209.85.216.43 as permitted sender) Received: from [209.85.216.43] (HELO mail-qa0-f43.google.com) (209.85.216.43) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2014 02:18:04 +0000 Received: by mail-qa0-f43.google.com with SMTP id w8so4992018qac.30 for ; Thu, 07 Aug 2014 19:17:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=GDRa9jQZtNADBcmyWl2nfvJKOnmHtKqND84nVhTgpAg=; b=RpMxMyAuNmU/HPgdLKHD0UDWAz+yvpc/ALx/065RX4Ui9gGU2RZY4+IjbJanCdCUjt Uz4ufylqa+R7p3CmRmysyVpkB1+DMtdPPzxiVxmfHA1st/OpjliPWmD1nJZmKKEhdSzE pxKwM9Xmg8I8PTR0NOYvn4U5talS2UBEcK5LcliNLJIr6z6mtTfbocXke+xbt4T5hKQr an3ZP4pDZYYhLieb5XuhzMcg0JP3HIn061deooxbJ3PjW1oQ9oEpT3WiUk4wdn6ktDkc mq9qmEZG9jR/useogbmBv4oQbDo9Y32tupijYKANcRS0g+yieUBSaoDVMAfj19Hov/Cp x+PQ== MIME-Version: 1.0 X-Received: by 10.224.43.10 with SMTP id u10mr33510867qae.20.1407464263774; Thu, 07 Aug 2014 19:17:43 -0700 (PDT) Received: by 10.224.125.6 with HTTP; Thu, 7 Aug 2014 19:17:43 -0700 (PDT) In-Reply-To: References: Date: Fri, 8 Aug 2014 10:17:43 +0800 Message-ID: Subject: Re: Large discrepancy in hdfs hbase rootdir size after copytable operation. From: tobe To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e0158b8ca79f931050014cf0e X-Virus-Checked: Checked by ClamAV on apache.org --089e0158b8ca79f931050014cf0e Content-Type: text/plain; charset=UTF-8 I can't repro this problem when I ran CopyTable. You could just "-du" to see the sizes of all files. On Fri, Aug 8, 2014 at 8:50 AM, Jean-Marc Spaggiari wrote: > Hi Colin, > > Just to make sure. > > Is table A from the source cluster and not compressed, and table B in the > destination cluster and SNAPPY compressed? Is that correct? Then ratio > should be the opposite. Are you able to du -h from hadoop to see if all > regions are evenly bigger or if anything else is wrong? > > > 2014-08-07 20:44 GMT-04:00 Colin Kincaid Williams : > > > I haven't yet tried to major compact table B. I will look up some > > documentation on WALs and snapshots to find this information in the hdfs > > filesystem tomorrow. Could it be caused by the bloomfilter existing on > > table B, but not table A? The funny thing is the source table is smaller > > than the destination. > > > > > > On Thu, Aug 7, 2014 at 4:50 PM, Esteban Gutierrez > > wrote: > > > > > Hi Colin, > > > > > > Have you verified if the content of /a_d includes WALs and/or the > content > > > of the snapshots or the HBase archive? have you tried to major compact > > > table B? does it makes any difference? > > > > > > regards, > > > esteban. > > > > > > > > > > > > -- > > > Cloudera, Inc. > > > > > > > > > > > > On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams > > > > wrote: > > > > > > > I used the copy table command to copy a database between the original > > > > cluster A and a new cluster B. I have noticed that the rootdir is > > larger > > > > than 2X the size of the original. I am trying to account for such a > > large > > > > difference. The following are some details about the table. > > > > > > > > > > > > I'm trying to figure out why my copied table is more than 2X the size > > of > > > > the original table. Could the bloomfilter itself account for this? > > > > > > > > The guide I used as a reference: > > > > > > > > > > > > > > http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters > > > > > > > > > > > > > > > > Supposedly the original command used to create the table on cluster > A: > > > > > > > > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', > > > > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'} > > > > > > > > > > > > How I created the target table on cluster B: > > > > > > > > create 'ADMd5','a',{ > > > > > > > > > > > > > > > > BLOOMFILTER => 'ROW', > > > > VERSIONS => '1', > > > > COMPRESSION => 'SNAPPY', > > > > MIN_VERSIONS => '0', > > > > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==', > > > > '/zyuFR1VmhJyF4rbWsFnEg==', > > > > '0sZYnBd83ul58d1O8I2JnA==', > > > > '2+03N7IicZH3ltrqZUX6kQ==', > > > > '4+/slRQtkBDU7Px6C9MAbg==', > > > > '6+1dGCQ/IBrCsrNQXe/9xQ==', > > > > '7+2pvtpHUQHWkZJoouR9wQ==', > > > > '8+4n2deXhzmrpe//2Fo6Fg==', > > > > '9+4SKW/BmNzpL68cXwKV1Q==', > > > > 'A+4ajStFkjEMf36cX5D9xg==', > > > > 'B+6Zm6Kccb3l6iM2L0epxQ==', > > > > 'C+6lKKDiOWl5qrRn72fNCw==', > > > > 'D+6dZMyn7m+NhJ7G07gqaw==', > > > > 'E+6BrimmrpAd92gZJ5hyMw==', > > > > 'G+5tisu4xWZMOJnDHeYBJg==', > > > > 'I+7fRy4dvqcM/L6dFRQk9g==', > > > > 'J+8ECMw1zeOyjfOg/ypXJA==', > > > > 'K+7tenLYn6a1aNLniL6tbg==']} > > > > > > > > > > > > How the tables now appear in hbase shell: > > > > > > > > table A: > > > > > > > > describe 'ADMd5' > > > > DESCRIPTION > > > > > > > > ENABLED > > > > > > > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', > > > > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', > > MIN_VER > > > > true > > > > > > > > SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY > => > > > > 'false', BLOCKCACHE => 'true'}]} > > > > > > > > > > > > 1 row(s) in 0.0370 seconds > > > > > > > > > > > > table B: > > > > > > > > hbase(main):003:0> describe 'ADMd5' > > > > DESCRIPTION > > > > > > > > ENABLED > > > > > > > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', > > > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', > > > MIN_VE > > > > true > > > > > > > > RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY > => > > > > 'false', BLOCKCACHE => 'true'}]} > > > > > > > > > > > > 1 row(s) in 0.0280 seconds > > > > > > > > > > > > > > > > The containing foldersize in hdfs: > > > > table A: > > > > sudo -u hdfs hadoop fs -dus -h /a_d > > > > dus: DEPRECATED: Please use 'du -s' instead. > > > > 227.4g /a_d > > > > > > > > table B: > > > > sudo -u hdfs hadoop fs -dus -h /a_d > > > > dus: DEPRECATED: Please use 'du -s' instead. > > > > 501.0g /a_d > > > > > > > > > > > > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2 > > > > > > > > > > --089e0158b8ca79f931050014cf0e--