hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Partha <parthaema...@gmail.com>
Subject Re: Multiple column families - scan performance
Date Tue, 22 Aug 2017 02:47:21 GMT
hbase(main):001:0> describe 'TABLE1'
Table TABLE1 is ENABLED
TABLE1
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2410 seconds

hbase(main):002:0> describe 'TABLE2'
Table TABLE2 is ENABLED
TABLE2
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

Here are the table definitions..

On Mon, Aug 21, 2017 at 10:06 PM, Partha <parthaemails@gmail.com> wrote:

>       final Scan scan = new Scan(startInclusive, endExclusive)
>             .addFamily(stage.getBytes())
>             .setCaching(DEFAULT_BATCH_SIZE)
>             .setCacheBlocks(false);
>
> Here is the scan test code. This will return ~1MM rows from both tables,
> while limiting scan to a single column family..
>
> Thanks.
>
> On Mon, Aug 21, 2017 at 2:16 PM, Partha <parthaemails@gmail.com> wrote:
>
>> addFamily only. There is only 1 column/qualifier per column family
>>
>>
>> On Aug 21, 2017 2:05 PM, "Anoop John" <anoop.hbase@gmail.com> wrote:
>>
>> In ur test are u using Scan#addColumn(byte [] family, byte []
>> qualifier)  or it is addFamily(byte [] family) only?
>>
>> On Mon, Aug 21, 2017 at 10:02 PM, Partha <parthaemails@gmail.com> wrote:
>> > Block cache is disabled on both scan tests. Setcaching is set to 500 in
>> both
>> > scans. Hbase version is 1.1.2.2.6.0.3-8
>> >
>> > Will post client scan test code.
>> >
>> > Thanks
>> >
>> >
>> > On Aug 21, 2017 8:57 AM, "Anoop John" <anoop.hbase@gmail.com> wrote:
>> >
>> > I was abt to ask to whether have run the tests after a major
>> > compaction.  But there also u are facing same issue it seems !
>> >
>> > Which version of HBase?
>> >
>> > Block cache been used?  What are the size and configs related to cache?
>> >
>> > Can u pls paste the exact client side code been used in tests?
>> >
>> > -Anoop-
>> >
>> > On Sun, Aug 20, 2017 at 4:36 AM, Partha <parthaemails@gmail.com> wrote:
>> >> Anoop,
>> >>
>> >> Yes, each column family (in both tables) uses the same encoding
>> >> (fast-diff)
>> >> and same compression (gzip).
>> >>
>> >> I suggest you to just try the simple test as my case and see if you
>> notice
>> >> a
>> >> similar drop in performance (almost linear to the # of column families)
>> >
>> >
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message