hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-18586) Multiple column families - scan performance
Date Mon, 14 Aug 2017 15:56:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Josh Elser resolved HBASE-18586.
    Resolution: Invalid

Please ask questions such as these on the user@hbase.apache.org. This JIRA instance is reserved
for concrete code changes, not user support. Thanks.

> Multiple column families - scan performance
> -------------------------------------------
>                 Key: HBASE-18586
>                 URL: https://issues.apache.org/jira/browse/HBASE-18586
>             Project: HBase
>          Issue Type: Bug
>          Components: scan
>            Reporter: PS0618
> I have 2 HBase tables - one with a single column family, and other has 4 column families.
Both tables are keyed by same rowkey, and the column families all have a single column qualifier
each, with a json string as value (each json payload is about 10-20K in size). All column
families use fast-diff encoding and gzip compression.
> After loading about 60MM rows to each table, a scan test on (any) single column family
in the 2nd table takes 4x the time to scan the single column family from the 1st table. In
both cases, the scanner is bounded by a start and stop key to scan 1MM rows. Performance did
not change much even after running a major compaction on both tables.
> Though HBase doc and other tech forums recommend not using more than 1 column family
per table, nothing I have read so far suggests scan performance will linearly degrade based
on number of column families. Has anyone else experienced this, and is there a simple explanation
for this?
> To note, the reason second table has 4 column families is even though I only scan one
column family at a time now, there are requirements to scan multiple column families from
that table given a set of rowkeys.
> Thanks for any insight into the performance question.

This message was sent by Atlassian JIRA

View raw message