Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C728417AA9 for ; Wed, 4 Mar 2015 01:45:10 +0000 (UTC) Received: (qmail 86569 invoked by uid 500); 4 Mar 2015 01:45:05 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 86521 invoked by uid 500); 4 Mar 2015 01:45:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 86509 invoked by uid 99); 4 Mar 2015 01:45:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2015 01:45:05 +0000 Date: Wed, 4 Mar 2015 01:45:05 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346204#comment-14346204 ] Lars Hofhansl commented on HBASE-13109: --------------------------------------- Did some more tests with Phoenix against 0.98, including some of the tests they used to validate their optimization to always use the WildcardColumnMatcher and doing the filtering themselves to avoid the cost of the ExplicitColumnTracker that does the seeking. Testing with 7 columns. One scenario was with all 7 columns in the same CF the other each column in its column family: Ran two queries: q1 = select count(1) where v3 = <> and v5 = <> and q2 = select avg(v2) where v3 = <> and v5 = <> 1CF case: || ||q1 w/ Phoenix p[t||q1 w/o Phoenix opt||q2 w/ Phoenix p[t||q2 w/o Phoenix opt|| |w/o patch|12.9|8.4|18.0|8.3| |w/ patch|7.5|7.2|7.5|7.1| Two observation: # Even with the Phoenix optimization this is faster because a bunch of SEEK_NEXT_ROWs are saved unless they're necessary. # The whole optimization is unnecessary now, it saves less than 10% in the *best* case with only one version per cell > Make better SEEK vs SKIP decisions during scanning > -------------------------------------------------- > > Key: HBASE-13109 > URL: https://issues.apache.org/jira/browse/HBASE-13109 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Priority: Minor > Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch > > > I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. > --- Old description --- > This is a continuation of HBASE-9778. > We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. > Turns out we spend a lot of time seeking. > Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. > We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)