hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Partha <parthaema...@gmail.com>
Subject Re: Multiple column families - scan performance
Date Tue, 22 Aug 2017 13:17:18 GMT
One other observation - even scanning 1MM rowkeys (using keyonlyfilter and
firstkeyonlyfilter) takes 4x the time on 2nd table. No column family is
queried at all in this test..

On Aug 21, 2017 10:47 PM, "Partha" <parthaemails@gmail.com> wrote:

> hbase(main):001:0> describe 'TABLE1'
> Table TABLE1 is ENABLED
> TABLE1
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> 1 row(s) in 0.2410 seconds
>
> hbase(main):002:0> describe 'TABLE2'
> Table TABLE2 is ENABLED
> TABLE2
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>
> Here are the table definitions..
>
> On Mon, Aug 21, 2017 at 10:06 PM, Partha <parthaemails@gmail.com> wrote:
>
>>       final Scan scan = new Scan(startInclusive, endExclusive)
>>             .addFamily(stage.getBytes())
>>             .setCaching(DEFAULT_BATCH_SIZE)
>>             .setCacheBlocks(false);
>>
>> Here is the scan test code. This will return ~1MM rows from both tables,
>> while limiting scan to a single column family..
>>
>> Thanks.
>>
>> On Mon, Aug 21, 2017 at 2:16 PM, Partha <parthaemails@gmail.com> wrote:
>>
>>> addFamily only. There is only 1 column/qualifier per column family
>>>
>>>
>>> On Aug 21, 2017 2:05 PM, "Anoop John" <anoop.hbase@gmail.com> wrote:
>>>
>>> In ur test are u using Scan#addColumn(byte [] family, byte []
>>> qualifier)  or it is addFamily(byte [] family) only?
>>>
>>> On Mon, Aug 21, 2017 at 10:02 PM, Partha <parthaemails@gmail.com> wrote:
>>> > Block cache is disabled on both scan tests. Setcaching is set to 500
>>> in both
>>> > scans. Hbase version is 1.1.2.2.6.0.3-8
>>> >
>>> > Will post client scan test code.
>>> >
>>> > Thanks
>>> >
>>> >
>>> > On Aug 21, 2017 8:57 AM, "Anoop John" <anoop.hbase@gmail.com> wrote:
>>> >
>>> > I was abt to ask to whether have run the tests after a major
>>> > compaction.  But there also u are facing same issue it seems !
>>> >
>>> > Which version of HBase?
>>> >
>>> > Block cache been used?  What are the size and configs related to cache?
>>> >
>>> > Can u pls paste the exact client side code been used in tests?
>>> >
>>> > -Anoop-
>>> >
>>> > On Sun, Aug 20, 2017 at 4:36 AM, Partha <parthaemails@gmail.com>
>>> wrote:
>>> >> Anoop,
>>> >>
>>> >> Yes, each column family (in both tables) uses the same encoding
>>> >> (fast-diff)
>>> >> and same compression (gzip).
>>> >>
>>> >> I suggest you to just try the simple test as my case and see if you
>>> notice
>>> >> a
>>> >> similar drop in performance (almost linear to the # of column
>>> families)
>>> >
>>> >
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message