Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2695A200D2F for ; Wed, 18 Oct 2017 07:41:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 251F9160BEA; Wed, 18 Oct 2017 05:41:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6A78D1609EB for ; Wed, 18 Oct 2017 07:41:11 +0200 (CEST) Received: (qmail 35503 invoked by uid 500); 18 Oct 2017 05:41:10 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 35451 invoked by uid 99); 18 Oct 2017 05:41:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Oct 2017 05:41:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 48EB2C9E45 for ; Wed, 18 Oct 2017 05:41:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id MrcqSI1Jd1nn for ; Wed, 18 Oct 2017 05:41:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id EA0D360DF3 for ; Wed, 18 Oct 2017 05:41:07 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E9A56E257F for ; Wed, 18 Oct 2017 05:41:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5974F243A8 for ; Wed, 18 Oct 2017 05:41:03 +0000 (UTC) Date: Wed, 18 Oct 2017 05:41:03 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-17958) Avoid passing unexpected cell to ScanQueryMatcher when optimize SEEK to SKIP MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 18 Oct 2017 05:41:12 -0000 [ https://issues.apache.org/jira/browse/HBASE-17958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208826#comment-16208826 ] Lars Hofhansl edited comment on HBASE-17958 at 10/18/17 5:40 AM: ----------------------------------------------------------------- I can fairly easily eliminate the repeated compare to the next indexed key - that does shave off only about 5% of the overall scan time, so not a lot, but measurable. I don't think there's any point to check again inside the loop. The duplicate compares in trySkipToNextRow and trySkipToNextCol are still there and the column trackers are still there, though. Those could only be avoided by sending the fake cells all the up to the columnTracker, I agree that wasn't nice, maybe we can come with other way to eliminate this cost. HBase is slow in scanning. Perhaps these micro-optimizations aren't worth it. [~stack] was (Author: lhofhansl): I can fairly easily eliminate the repeated compare to the next indexed key - that does shave off only about 5% of the overall scan time, so not a lot, but measurable. I don't think there's any point to check again inside the loop. The duplicate compares in trySkipToNextRow and trySkipToNextCol are still there and the column trackers are still there, though. Those could only be avoided by sending the fake cells all the up to the columnTracker, I agree that wasn't nice, maybe we can come with other way to eliminate this cost. HBase is slow in scanning. Perhaps these micro-optimizations aren't worth it. > Avoid passing unexpected cell to ScanQueryMatcher when optimize SEEK to SKIP > ---------------------------------------------------------------------------- > > Key: HBASE-17958 > URL: https://issues.apache.org/jira/browse/HBASE-17958 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.4.0 > > Attachments: 0001-add-one-ut-testWithColumnCountGetFilter.patch, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-v1.patch, HBASE-17958-v2.patch, HBASE-17958-v3.patch, HBASE-17958-v4.patch, HBASE-17958-v5.patch, HBASE-17958-v6.patch, HBASE-17958-v7.patch, HBASE-17958-v7.patch > > > {code} > ScanQueryMatcher.MatchCode qcode = matcher.match(cell); > qcode = optimize(qcode, cell); > {code} > The optimize method may change the MatchCode from SEEK_NEXT_COL/SEEK_NEXT_ROW to SKIP. But it still pass the next cell to ScanQueryMatcher. It will get wrong result when use some filter, etc. ColumnCountGetFilter. It just count the columns's number. If pass a same column to this filter, the count result will be wrong. So we should avoid passing cell to ScanQueryMatcher when optimize SEEK to SKIP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)