hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Kincaid Williams <disc...@uw.edu>
Subject Re: Large discrepancy in hdfs hbase rootdir size after copytable operation.
Date Fri, 08 Aug 2014 00:44:35 GMT
I haven't yet tried to major compact table B. I will look up some
documentation on WALs and snapshots to find this information in the hdfs
filesystem tomorrow. Could it be caused by the bloomfilter existing on
table B, but not table A? The funny thing is the source table is smaller
than the destination.


On Thu, Aug 7, 2014 at 4:50 PM, Esteban Gutierrez <esteban@cloudera.com>
wrote:

> Hi Colin,
>
> Have you verified if the content of /a_d includes WALs and/or the content
> of the snapshots or the HBase archive? have you tried to major compact
> table B?  does it makes any difference?
>
> regards,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
>
> On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>
> > I used the copy table command to copy a database between the original
> > cluster A and a new cluster B. I have noticed that the rootdir is larger
> > than 2X the size of the original. I am trying to account for such a large
> > difference. The following are some details about the table.
> >
> >
> > I'm trying to figure out why my copied table is more than 2X the size of
> > the original table. Could the bloomfilter itself account for this?
> >
> > The guide I used as a reference:
> >
> >
> http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
> >
> >
> >
> > Supposedly the original command used to create the table on cluster A:
> >
> > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1',
> > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
> >
> >
> > How I created the target table on cluster B:
> >
> > create 'ADMd5','a',{
> >
> >
> >
> > BLOOMFILTER => 'ROW',
> > VERSIONS => '1',
> > COMPRESSION => 'SNAPPY',
> > MIN_VERSIONS => '0',
> > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
> > '/zyuFR1VmhJyF4rbWsFnEg==',
> > '0sZYnBd83ul58d1O8I2JnA==',
> > '2+03N7IicZH3ltrqZUX6kQ==',
> > '4+/slRQtkBDU7Px6C9MAbg==',
> > '6+1dGCQ/IBrCsrNQXe/9xQ==',
> > '7+2pvtpHUQHWkZJoouR9wQ==',
> > '8+4n2deXhzmrpe//2Fo6Fg==',
> > '9+4SKW/BmNzpL68cXwKV1Q==',
> > 'A+4ajStFkjEMf36cX5D9xg==',
> > 'B+6Zm6Kccb3l6iM2L0epxQ==',
> > 'C+6lKKDiOWl5qrRn72fNCw==',
> > 'D+6dZMyn7m+NhJ7G07gqaw==',
> > 'E+6BrimmrpAd92gZJ5hyMw==',
> > 'G+5tisu4xWZMOJnDHeYBJg==',
> > 'I+7fRy4dvqcM/L6dFRQk9g==',
> > 'J+8ECMw1zeOyjfOg/ypXJA==',
> > 'K+7tenLYn6a1aNLniL6tbg==']}
> >
> >
> > How the tables now appear in hbase shell:
> >
> > table A:
> >
> > describe 'ADMd5'
> > DESCRIPTION
> >
> >   ENABLED
> >
> >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE',
> > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER
> > true
> >
> >  SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> > 'false', BLOCKCACHE => 'true'}]}
> >
> >
> > 1 row(s) in 0.0370 seconds
> >
> >
> > table B:
> >
> > hbase(main):003:0> describe 'ADMd5'
> > DESCRIPTION
> >
> >   ENABLED
> >
> >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW',
> > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY',
> MIN_VE
> > true
> >
> >  RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> > 'false', BLOCKCACHE => 'true'}]}
> >
> >
> > 1 row(s) in 0.0280 seconds
> >
> >
> >
> > The containing foldersize in hdfs:
> > table A:
> > sudo -u hdfs hadoop fs -dus -h /a_d
> > dus: DEPRECATED: Please use 'du -s' instead.
> > 227.4g  /a_d
> >
> > table B:
> > sudo -u hdfs hadoop fs -dus -h /a_d
> > dus: DEPRECATED: Please use 'du -s' instead.
> > 501.0g  /a_d
> >
> >
> > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message