hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Large discrepancy in hdfs hbase rootdir size after copytable operation.
Date Sat, 09 Aug 2014 04:51:58 GMT
Hi Colin,

you might want to consider upgrading. The current stable version is 0.98.4 (soon .5).

Even just going to 0.94 will give a lot of new features, stability, and performance.
0.92.x can be upgraded to 0.94.x without any downtime and without any upgrade steps necessary.
For an upgrade to 0.98 and later you'd need some downtime and also excute an upgrade step.


-- Lars



----- Original Message -----
From: Colin Kincaid Williams <discord@uw.edu>
To: user@hbase.apache.org
Cc: 
Sent: Friday, August 8, 2014 1:16 PM
Subject: Re: Large discrepancy in hdfs hbase rootdir size after copytable operation.

Not in the hbase shell I have:

hbase version
14/08/08 14:16:08 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.3
14/08/08 14:16:08 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.3
-r Unknown
14/08/08 14:16:08 INFO util.VersionInfo: Compiled by jenkins on Sat Jan 26
17:11:38 PST 2013






On Fri, Aug 8, 2014 at 12:56 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Using simplified version of your command, I saw the following in shell
> output (you may have noticed as well):
>
> An argument ignored (unknown or overridden): BLOOMFILTER
> An argument ignored (unknown or overridden): VERSIONS
> 0 row(s) in 2.1110 seconds
>
> Cheers
>
>
> On Fri, Aug 8, 2014 at 12:23 PM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>
> > I have discovered the error. I made the mistake regarding the compression
> > and the bloom filter. The new table doesn't have them enabled, and the
> old
> > does. However I'm wondering how I can create tables with splits and bf
> and
> > compression enabled. Shouldn't the following command return an error?
> >
> > hbase(main):001:0> create 'ADMd5','a',{
> >
> > hbase(main):002:1* BLOOMFILTER => 'ROW',
> > hbase(main):003:1* VERSIONS => '1',
> > hbase(main):004:1* COMPRESSION => 'SNAPPY',
> > hbase(main):005:1* MIN_VERSIONS => '0',
> > hbase(main):006:1* SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
> > hbase(main):007:2* '/zyuFR1VmhJyF4rbWsFnEg==',
> > hbase(main):008:2* '0sZYnBd83ul58d1O8I2JnA==',
> > hbase(main):009:2* '2+03N7IicZH3ltrqZUX6kQ==',
> > hbase(main):010:2* '4+/slRQtkBDU7Px6C9MAbg==',
> > hbase(main):011:2* '6+1dGCQ/IBrCsrNQXe/9xQ==',
> > hbase(main):012:2* '7+2pvtpHUQHWkZJoouR9wQ==',
> > hbase(main):013:2* '8+4n2deXhzmrpe//2Fo6Fg==',
> > hbase(main):014:2* '9+4SKW/BmNzpL68cXwKV1Q==',
> > hbase(main):015:2* 'A+4ajStFkjEMf36cX5D9xg==',
> > hbase(main):016:2* 'B+6Zm6Kccb3l6iM2L0epxQ==',
> > hbase(main):017:2* 'C+6lKKDiOWl5qrRn72fNCw==',
> > hbase(main):018:2* 'D+6dZMyn7m+NhJ7G07gqaw==',
> > hbase(main):019:2* 'E+6BrimmrpAd92gZJ5hyMw==',
> > hbase(main):020:2* 'G+5tisu4xWZMOJnDHeYBJg==',
> > hbase(main):021:2* 'I+7fRy4dvqcM/L6dFRQk9g==',
> > hbase(main):022:2* 'J+8ECMw1zeOyjfOg/ypXJA==',
> > hbase(main):023:2* 'K+7tenLYn6a1aNLniL6tbg==',]}
> > 0 row(s) in 1.8010 seconds
> >
> > hbase(main):024:0> describe 'ADMd5'
> > DESCRIPTION                                        ENABLED
> >
> >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOO true
> >
> >  MFILTER => 'NONE', REPLICATION_SCOPE => '0', VERS
> >
> >  IONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS
> >
> >  => '0', TTL => '2147483647', BLOCKSIZE => '65536'
> >
> >  , IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >
> > 1 row(s) in 0.0420 seconds
> >
> >
> >
> > On Thu, Aug 7, 2014 at 5:50 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org
> > > wrote:
> >
> > > Hi Colin,
> > >
> > > Just to make sure.
> > >
> > > Is table A from the source cluster and not compressed, and table B in
> the
> > > destination cluster and SNAPPY compressed? Is that correct? Then ratio
> > > should be the opposite. Are you able to du -h from hadoop to see if all
> > > regions are evenly bigger or if anything else is wrong?
> > >
> > >
> > > 2014-08-07 20:44 GMT-04:00 Colin Kincaid Williams <discord@uw.edu>:
> > >
> > > > I haven't yet tried to major compact table B. I will look up some
> > > > documentation on WALs and snapshots to find this information in the
> > hdfs
> > > > filesystem tomorrow. Could it be caused by the bloomfilter existing
> on
> > > > table B, but not table A? The funny thing is the source table is
> > smaller
> > > > than the destination.
> > > >
> > > >
> > > > On Thu, Aug 7, 2014 at 4:50 PM, Esteban Gutierrez <
> > esteban@cloudera.com>
> > > > wrote:
> > > >
> > > > > Hi Colin,
> > > > >
> > > > > Have you verified if the content of /a_d includes WALs and/or the
> > > content
> > > > > of the snapshots or the HBase archive? have you tried to major
> > compact
> > > > > table B?  does it makes any difference?
> > > > >
> > > > > regards,
> > > > > esteban.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cloudera, Inc.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams <
> > discord@uw.edu
> > > >
> > > > > wrote:
> > > > >
> > > > > > I used the copy table command to copy a database between the
> > original
> > > > > > cluster A and a new cluster B. I have noticed that the rootdir
is
> > > > larger
> > > > > > than 2X the size of the original. I am trying to account for
> such a
> > > > large
> > > > > > difference. The following are some details about the table.
> > > > > >
> > > > > >
> > > > > > I'm trying to figure out why my copied table is more than 2X
the
> > size
> > > > of
> > > > > > the original table. Could the bloomfilter itself account for
> this?
> > > > > >
> > > > > > The guide I used as a reference:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
> > > > > >
> > > > > >
> > > > > >
> > > > > > Supposedly the original command used to create the table on
> cluster
> > > A:
> > > > > >
> > > > > > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS
=>
> > '1',
> > > > > > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
> > > > > >
> > > > > >
> > > > > > How I created the target table on cluster B:
> > > > > >
> > > > > > create 'ADMd5','a',{
> > > > > >
> > > > > >
> > > > > >
> > > > > > BLOOMFILTER => 'ROW',
> > > > > > VERSIONS => '1',
> > > > > > COMPRESSION => 'SNAPPY',
> > > > > > MIN_VERSIONS => '0',
> > > > > > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
> > > > > > '/zyuFR1VmhJyF4rbWsFnEg==',
> > > > > > '0sZYnBd83ul58d1O8I2JnA==',
> > > > > > '2+03N7IicZH3ltrqZUX6kQ==',
> > > > > > '4+/slRQtkBDU7Px6C9MAbg==',
> > > > > > '6+1dGCQ/IBrCsrNQXe/9xQ==',
> > > > > > '7+2pvtpHUQHWkZJoouR9wQ==',
> > > > > > '8+4n2deXhzmrpe//2Fo6Fg==',
> > > > > > '9+4SKW/BmNzpL68cXwKV1Q==',
> > > > > > 'A+4ajStFkjEMf36cX5D9xg==',
> > > > > > 'B+6Zm6Kccb3l6iM2L0epxQ==',
> > > > > > 'C+6lKKDiOWl5qrRn72fNCw==',
> > > > > > 'D+6dZMyn7m+NhJ7G07gqaw==',
> > > > > > 'E+6BrimmrpAd92gZJ5hyMw==',
> > > > > > 'G+5tisu4xWZMOJnDHeYBJg==',
> > > > > > 'I+7fRy4dvqcM/L6dFRQk9g==',
> > > > > > 'J+8ECMw1zeOyjfOg/ypXJA==',
> > > > > > 'K+7tenLYn6a1aNLniL6tbg==']}
> > > > > >
> > > > > >
> > > > > > How the tables now appear in hbase shell:
> > > > > >
> > > > > > table A:
> > > > > >
> > > > > > describe 'ADMd5'
> > > > > > DESCRIPTION
> > > > > >
> > > > > >   ENABLED
> > > > > >
> > > > > >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER
=>
> > 'NONE',
> > > > > > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION
=> 'NONE',
> > > > MIN_VER
> > > > > > true
> > > > > >
> > > > > >  SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY
> > > =>
> > > > > > 'false', BLOCKCACHE => 'true'}]}
> > > > > >
> > > > > >
> > > > > > 1 row(s) in 0.0370 seconds
> > > > > >
> > > > > >
> > > > > > table B:
> > > > > >
> > > > > > hbase(main):003:0> describe 'ADMd5'
> > > > > > DESCRIPTION
> > > > > >
> > > > > >   ENABLED
> > > > > >
> > > > > >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER
=>
> 'ROW',
> > > > > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION
=>
> 'SNAPPY',
> > > > > MIN_VE
> > > > > > true
> > > > > >
> > > > > >  RSIONS => '0', TTL => '2147483647', BLOCKSIZE =>
'65536',
> > IN_MEMORY
> > > =>
> > > > > > 'false', BLOCKCACHE => 'true'}]}
> > > > > >
> > > > > >
> > > > > > 1 row(s) in 0.0280 seconds
> > > > > >
> > > > > >
> > > > > >
> > > > > > The containing foldersize in hdfs:
> > > > > > table A:
> > > > > > sudo -u hdfs hadoop fs -dus -h /a_d
> > > > > > dus: DEPRECATED: Please use 'du -s' instead.
> > > > > > 227.4g  /a_d
> > > > > >
> > > > > > table B:
> > > > > > sudo -u hdfs hadoop fs -dus -h /a_d
> > > > > > dus: DEPRECATED: Please use 'du -s' instead.
> > > > > > 501.0g  /a_d
> > > > > >
> > > > > >
> > > > > > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2
> > > > > >
> > > > >
> > > >
> > >
> >
>


Mime
View raw message