hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Multiple column families - scan performance
Date Tue, 22 Aug 2017 05:19:36 GMT
Can you try one more thing - instead of addFamily try using
addColumn(byte[] fam, byte[] qual). Since you are sure that there is only
one qualifier.
See how it works? Does it reduce the performance or increase the
performance than the addFamily() and how is it related to the 1 CF case.

Also just to be sure - are you sure that the 4 CF table has only one
qualifier?

Regards
Ram

On Tue, Aug 22, 2017 at 8:17 AM, Partha <parthaemails@gmail.com> wrote:

> hbase(main):001:0> describe 'TABLE1'
> Table TABLE1 is ENABLED
> TABLE1
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> 1 row(s) in 0.2410 seconds
>
> hbase(main):002:0> describe 'TABLE2'
> Table TABLE2 is ENABLED
> TABLE2
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>
> Here are the table definitions..
>
> On Mon, Aug 21, 2017 at 10:06 PM, Partha <parthaemails@gmail.com> wrote:
>
> >       final Scan scan = new Scan(startInclusive, endExclusive)
> >             .addFamily(stage.getBytes())
> >             .setCaching(DEFAULT_BATCH_SIZE)
> >             .setCacheBlocks(false);
> >
> > Here is the scan test code. This will return ~1MM rows from both tables,
> > while limiting scan to a single column family..
> >
> > Thanks.
> >
> > On Mon, Aug 21, 2017 at 2:16 PM, Partha <parthaemails@gmail.com> wrote:
> >
> >> addFamily only. There is only 1 column/qualifier per column family
> >>
> >>
> >> On Aug 21, 2017 2:05 PM, "Anoop John" <anoop.hbase@gmail.com> wrote:
> >>
> >> In ur test are u using Scan#addColumn(byte [] family, byte []
> >> qualifier)  or it is addFamily(byte [] family) only?
> >>
> >> On Mon, Aug 21, 2017 at 10:02 PM, Partha <parthaemails@gmail.com>
> wrote:
> >> > Block cache is disabled on both scan tests. Setcaching is set to 500
> in
> >> both
> >> > scans. Hbase version is 1.1.2.2.6.0.3-8
> >> >
> >> > Will post client scan test code.
> >> >
> >> > Thanks
> >> >
> >> >
> >> > On Aug 21, 2017 8:57 AM, "Anoop John" <anoop.hbase@gmail.com> wrote:
> >> >
> >> > I was abt to ask to whether have run the tests after a major
> >> > compaction.  But there also u are facing same issue it seems !
> >> >
> >> > Which version of HBase?
> >> >
> >> > Block cache been used?  What are the size and configs related to
> cache?
> >> >
> >> > Can u pls paste the exact client side code been used in tests?
> >> >
> >> > -Anoop-
> >> >
> >> > On Sun, Aug 20, 2017 at 4:36 AM, Partha <parthaemails@gmail.com>
> wrote:
> >> >> Anoop,
> >> >>
> >> >> Yes, each column family (in both tables) uses the same encoding
> >> >> (fast-diff)
> >> >> and same compression (gzip).
> >> >>
> >> >> I suggest you to just try the simple test as my case and see if you
> >> notice
> >> >> a
> >> >> similar drop in performance (almost linear to the # of column
> families)
> >> >
> >> >
> >>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message