Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A179B10BF1 for ; Wed, 23 Oct 2013 00:20:43 +0000 (UTC) Received: (qmail 20443 invoked by uid 500); 23 Oct 2013 00:20:43 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 20401 invoked by uid 500); 23 Oct 2013 00:20:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 20393 invoked by uid 99); 23 Oct 2013 00:20:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Oct 2013 00:20:43 +0000 Date: Wed, 23 Oct 2013 00:20:43 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802459#comment-13802459 ] Lars Hofhansl commented on HBASE-9778: -------------------------------------- I can explain here. A comment in the code would be misplaced and make it more confusing. ScanWildcardColumnTracker does the same. The reason is that we can continue to issue SKIPs once we're past the max/min versions. I.e. as long as we're inside the versions range we can issue INCLUDEs, once out we issue SKIPs or SEEK_NEXT_COLs. With >= we have to issue a INCLUDE_AND_SEEK_NEXT_COL and never come to back to this column. > Avoid seeking to next column in ExplicitColumnTracker when possible > ------------------------------------------------------------------- > > Key: HBASE-9778 > URL: https://issues.apache.org/jira/browse/HBASE-9778 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt > > > The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. > My idea here is to avoid the seeking if we know that there aren't many versions to skip. > How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly. > HBASE-9769 has some initial number for this approach: > Interestingly it depends on which column(s) is (are) selected. > Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. > Without patch: > ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| > |6.4|8.5|14.3|14.6|11.1|20.3| > With patch: > ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| > |6.4|8.4|8.9|9.9|6.4|10.0| > Variation here was +- 0.2s. > So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)