hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Multiple column families - scan performance
Date Fri, 18 Aug 2017 01:48:58 GMT
So on the 2nd table, even if there are 4 CFs , while scanning you need
only data from single CF.  And this under test CF is similar to what u
have in the 1st table?  I mean same encoding and compression schema
and data size?   While creating scan for 2nd table how u make?  I hope
u do
Scan s = new Scan();
s.setStartRow
s.setStopRow
s.addFamily(cf)

Correct?

-Anoop-

On Thu, Aug 17, 2017 at 4:42 PM, Partha <parthaemails@gmail.com> wrote:
> I have 2 HBase tables - one with a single column family, and other has 4
> column families. Both tables are keyed by same rowkey, and the column
> families all have a single column qualifier each, with a json string as
> value (each json payload is about 10-20K in size). All column families use
> fast-diff encoding and gzip compression.
>
> After loading about 60MM rows to each table, a scan test on (any) single
> column family in the 2nd table takes 4x the time to scan the single column
> family from the 1st table. In both cases, the scanner is bounded by a start
> and stop key to scan 1MM rows. Performance did not change much even after
> running a major compaction on both tables.
>
> Though HBase doc and other tech forums recommend not using more than 1
> column family per table, nothing I have read so far suggests scan
> performance will linearly degrade based on number of column families. Has
> anyone else experienced this, and is there a simple explanation for this?
>
> To note, the reason second table has 4 column families is even though I
> only scan one column family at a time now, there are requirements to scan
> multiple column families from that table given a set of rowkeys.
>
> Thanks for any insight into the performance question.

Mime
View raw message