Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 261D810699 for ; Wed, 4 Sep 2013 17:57:02 +0000 (UTC) Received: (qmail 41888 invoked by uid 500); 4 Sep 2013 17:56:57 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 41761 invoked by uid 500); 4 Sep 2013 17:56:56 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 41454 invoked by uid 99); 4 Sep 2013 17:56:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 17:56:54 +0000 Date: Wed, 4 Sep 2013 17:56:54 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-9428) Regex filters are at least an order of magnitude slower since 0.94.3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9428: --------------------------------- Attachment: 9428-trunk.txt Same for trunk. > Regex filters are at least an order of magnitude slower since 0.94.3 > -------------------------------------------------------------------- > > Key: HBASE-9428 > URL: https://issues.apache.org/jira/browse/HBASE-9428 > Project: HBase > Issue Type: Bug > Reporter: Jean-Daniel Cryans > Fix For: 0.98.0, 0.94.12, 0.96.1 > > Attachments: 9428-0.94.txt, 9428-trunk.txt > > > I found this issue after debugging a performance problem on an OpenTSDB cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It was caused by HBASE-7279 (ping [~lhofhansl]). > The easiest way to see it is to run a simple 1 client PE: > {noformat} > $ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 > {noformat} > Then in the shell do a filter scan (flush the table first and make sure if fits in your blockcache if you want stable numbers). > Pre HBASE-7279: > {noformat} > hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"} > ROW COLUMN+CELL > 0000055872 column=info:data, timestamp=1378248850191, value=(blanked) > 1 row(s) in 1.2780 seconds > {noformat} > Post HBASE-7279 > {noformat} > hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"} > ROW COLUMN+CELL > 0000055872 column=info:data, timestamp=1378248850191, value=(blanked) > 1 row(s) in 24.2940 seconds > {noformat} > I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all slow like this. > It seems that since that jira went in we do a lot more row matching, and running the regex gets super expensive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira