Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 40C9C116F9 for ; Thu, 7 Aug 2014 21:01:07 +0000 (UTC) Received: (qmail 23043 invoked by uid 500); 7 Aug 2014 21:01:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 22974 invoked by uid 500); 7 Aug 2014 21:01:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 22963 invoked by uid 99); 7 Aug 2014 21:01:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 21:01:04 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 209.85.192.52 is neither permitted nor denied by domain of discord@uw.edu) Received: from [209.85.192.52] (HELO mail-qg0-f52.google.com) (209.85.192.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 21:01:02 +0000 Received: by mail-qg0-f52.google.com with SMTP id f51so5142512qge.11 for ; Thu, 07 Aug 2014 14:00:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=rOOrHfr2ozODELSF41lCtwfSotqiJoYISZ/QIci4uwE=; b=HTbk/Nqmyt1yfca0I15ljUHAFrgYM3ufmeroM6ABDAHh8bWK+JNJE4AG6BqKLnNh+w eiegCmED8lPpHl0rDrDQEUj9p1Bady8V7D01NGhauHyfGN2u+cX1Zb5WpC7NV5V2DHFy 12DP9HKMJ1YfL/BR0eNrHlm1HBSRs9S8SlJ7I5x+LMGl4kfAg2Ns/4vxuG/+qpGM6f+F TWnrL/Lg6/9UcmRzp5/5CC++5Y/yXnEkxtEb9Mpmr8RjWRCLjmvW8ZBQrTA5ozBMJkSK jaJS7zvoTGDcdh1C2hew9HBnqbjuW8YmaJdT3c4o8LK2Xwn4xkI0wLDUW4jxR5/zCyjx oRyA== X-Gm-Message-State: ALoCoQnSDbZM/Cxo/kzAcPnHlFSpq8vxB0ZK89bwUcKfu5cQoFaIHaGKzaFHLAQz5VE9RRE0jlY8 MIME-Version: 1.0 X-Received: by 10.140.86.147 with SMTP id p19mr17103136qgd.66.1407445237140; Thu, 07 Aug 2014 14:00:37 -0700 (PDT) Received: by 10.140.82.38 with HTTP; Thu, 7 Aug 2014 14:00:37 -0700 (PDT) Date: Thu, 7 Aug 2014 14:00:37 -0700 Message-ID: Subject: Large discrepancy in hdfs hbase rootdir size after copytable operation. From: Colin Kincaid Williams To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a11c1275066a7e105001061c3 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1275066a7e105001061c3 Content-Type: text/plain; charset=UTF-8 I used the copy table command to copy a database between the original cluster A and a new cluster B. I have noticed that the rootdir is larger than 2X the size of the original. I am trying to account for such a large difference. The following are some details about the table. I'm trying to figure out why my copied table is more than 2X the size of the original table. Could the bloomfilter itself account for this? The guide I used as a reference: http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters Supposedly the original command used to create the table on cluster A: create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'} How I created the target table on cluster B: create 'ADMd5','a',{ BLOOMFILTER => 'ROW', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==', '/zyuFR1VmhJyF4rbWsFnEg==', '0sZYnBd83ul58d1O8I2JnA==', '2+03N7IicZH3ltrqZUX6kQ==', '4+/slRQtkBDU7Px6C9MAbg==', '6+1dGCQ/IBrCsrNQXe/9xQ==', '7+2pvtpHUQHWkZJoouR9wQ==', '8+4n2deXhzmrpe//2Fo6Fg==', '9+4SKW/BmNzpL68cXwKV1Q==', 'A+4ajStFkjEMf36cX5D9xg==', 'B+6Zm6Kccb3l6iM2L0epxQ==', 'C+6lKKDiOWl5qrRn72fNCw==', 'D+6dZMyn7m+NhJ7G07gqaw==', 'E+6BrimmrpAd92gZJ5hyMw==', 'G+5tisu4xWZMOJnDHeYBJg==', 'I+7fRy4dvqcM/L6dFRQk9g==', 'J+8ECMw1zeOyjfOg/ypXJA==', 'K+7tenLYn6a1aNLniL6tbg==']} How the tables now appear in hbase shell: table A: describe 'ADMd5' DESCRIPTION ENABLED {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER true SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0370 seconds table B: hbase(main):003:0> describe 'ADMd5' DESCRIPTION ENABLED {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE true RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0280 seconds The containing foldersize in hdfs: table A: sudo -u hdfs hadoop fs -dus -h /a_d dus: DEPRECATED: Please use 'du -s' instead. 227.4g /a_d table B: sudo -u hdfs hadoop fs -dus -h /a_d dus: DEPRECATED: Please use 'du -s' instead. 501.0g /a_d https://gist.github.com/drocsid/80bba7b6b19d64fde6c2 --001a11c1275066a7e105001061c3--