hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9428) Regex filters are at least an order of magnitude slower since 0.94.3
Date Wed, 04 Sep 2013 00:49:52 GMT
Jean-Daniel Cryans created HBASE-9428:
-----------------------------------------

             Summary: Regex filters are at least an order of magnitude slower since 0.94.3
                 Key: HBASE-9428
                 URL: https://issues.apache.org/jira/browse/HBASE-9428
             Project: HBase
          Issue Type: Bug
            Reporter: Jean-Daniel Cryans
             Fix For: 0.98.0, 0.94.12, 0.96.1


I found this issue after debugging a performance problem on an OpenTSDB cluster, it was basically
unusable after an upgrade from 0.94.2 to 0.94.6. It was caused by HBASE-7279 (ping [~lhofhansl]).

The easiest way to see it is to run a simple 1 client PE:

{noformat}
$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
{noformat}

Then in the shell do a filter scan (flush the table first and make sure if fits in your blockcache
if you want stable numbers).

Pre HBASE-7279:
{noformat}
hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872')
)"}
ROW                                                 COLUMN+CELL                          
                                                                                         
                    
 0000055872                                         column=info:data, timestamp=1378248850191,
value=(blanked)                                                                          
                                                         
1 row(s) in 1.2780 seconds
{noformat}

Post HBASE-7279

{noformat}
hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872')
)"}
ROW                                                 COLUMN+CELL                          
                                                                                         
                    
 0000055872                                         column=info:data, timestamp=1378248850191,
value=(blanked)                                                                          
                                                           
1 row(s) in 24.2940 seconds
{noformat}

I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all slow like this.

It seems that since that jira went in we do a lot more row matching, and running the regex
gets super expensive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message