hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Kincaid Williams <disc...@uw.edu>
Subject Large discrepancy in hdfs hbase rootdir size after copytable operation.
Date Thu, 07 Aug 2014 21:00:37 GMT
I used the copy table command to copy a database between the original
cluster A and a new cluster B. I have noticed that the rootdir is larger
than 2X the size of the original. I am trying to account for such a large
difference. The following are some details about the table.


I'm trying to figure out why my copied table is more than 2X the size of
the original table. Could the bloomfilter itself account for this?

The guide I used as a reference:
http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters



Supposedly the original command used to create the table on cluster A:

create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1',
COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}


How I created the target table on cluster B:

create 'ADMd5','a',{



BLOOMFILTER => 'ROW',
VERSIONS => '1',
COMPRESSION => 'SNAPPY',
MIN_VERSIONS => '0',
SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
'/zyuFR1VmhJyF4rbWsFnEg==',
'0sZYnBd83ul58d1O8I2JnA==',
'2+03N7IicZH3ltrqZUX6kQ==',
'4+/slRQtkBDU7Px6C9MAbg==',
'6+1dGCQ/IBrCsrNQXe/9xQ==',
'7+2pvtpHUQHWkZJoouR9wQ==',
'8+4n2deXhzmrpe//2Fo6Fg==',
'9+4SKW/BmNzpL68cXwKV1Q==',
'A+4ajStFkjEMf36cX5D9xg==',
'B+6Zm6Kccb3l6iM2L0epxQ==',
'C+6lKKDiOWl5qrRn72fNCw==',
'D+6dZMyn7m+NhJ7G07gqaw==',
'E+6BrimmrpAd92gZJ5hyMw==',
'G+5tisu4xWZMOJnDHeYBJg==',
'I+7fRy4dvqcM/L6dFRQk9g==',
'J+8ECMw1zeOyjfOg/ypXJA==',
'K+7tenLYn6a1aNLniL6tbg==']}


How the tables now appear in hbase shell:

table A:

describe 'ADMd5'
DESCRIPTION

  ENABLED

 {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER
true

 SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]}


1 row(s) in 0.0370 seconds


table B:

hbase(main):003:0> describe 'ADMd5'
DESCRIPTION

  ENABLED

 {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE
true

 RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]}


1 row(s) in 0.0280 seconds



The containing foldersize in hdfs:
table A:
sudo -u hdfs hadoop fs -dus -h /a_d
dus: DEPRECATED: Please use 'du -s' instead.
227.4g  /a_d

table B:
sudo -u hdfs hadoop fs -dus -h /a_d
dus: DEPRECATED: Please use 'du -s' instead.
501.0g  /a_d


https://gist.github.com/drocsid/80bba7b6b19d64fde6c2

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message