Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EBFE1040E for ; Mon, 7 Oct 2013 22:43:43 +0000 (UTC) Received: (qmail 28560 invoked by uid 500); 7 Oct 2013 22:43:42 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 28530 invoked by uid 500); 7 Oct 2013 22:43:42 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 28509 invoked by uid 99); 7 Oct 2013 22:43:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Oct 2013 22:43:42 +0000 Date: Mon, 7 Oct 2013 22:43:42 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9428) Regex filters are at least an order of magnitude slower since 0.94.3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788660#comment-13788660 ] Hudson commented on HBASE-9428: ------------------------------- SUCCESS: Integrated in HBase-0.94-security #309 (See [https://builds.apache.org/job/HBase-0.94-security/309/]) HBASE-9711 Improve HBASE-9428 - avoid copying bytes for RegexFilter unless necessary (larsh: rev 1530061) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java > Regex filters are at least an order of magnitude slower since 0.94.3 > -------------------------------------------------------------------- > > Key: HBASE-9428 > URL: https://issues.apache.org/jira/browse/HBASE-9428 > Project: HBase > Issue Type: Bug > Reporter: Jean-Daniel Cryans > Assignee: Lars Hofhansl > Fix For: 0.98.0, 0.94.12, 0.96.0 > > Attachments: 9428-0.94.txt, 9428-trunk.txt > > > I found this issue after debugging a performance problem on an OpenTSDB cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It was caused by HBASE-7279 (ping [~lhofhansl]). > The easiest way to see it is to run a simple 1 client PE: > {noformat} > $ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 > {noformat} > Then in the shell do a filter scan (flush the table first and make sure if fits in your blockcache if you want stable numbers). > Pre HBASE-7279: > {noformat} > hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"} > ROW COLUMN+CELL > 0000055872 column=info:data, timestamp=1378248850191, value=(blanked) > 1 row(s) in 1.2780 seconds > {noformat} > Post HBASE-7279 > {noformat} > hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"} > ROW COLUMN+CELL > 0000055872 column=info:data, timestamp=1378248850191, value=(blanked) > 1 row(s) in 24.2940 seconds > {noformat} > I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all slow like this. > It seems that since that jira went in we do a lot more row matching, and running the regex gets super expensive. -- This message was sent by Atlassian JIRA (v6.1#6144)