Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9752D10A5C for ; Sat, 20 Dec 2014 00:45:14 +0000 (UTC) Received: (qmail 50082 invoked by uid 500); 20 Dec 2014 00:45:14 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 50026 invoked by uid 500); 20 Dec 2014 00:45:14 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 49841 invoked by uid 99); 20 Dec 2014 00:45:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Dec 2014 00:45:14 +0000 Date: Sat, 20 Dec 2014 00:45:14 +0000 (UTC) From: "Mike Matrigali (JIRA)" To: derby-dev@db.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DERBY-6784) change optimizer to choose in list multiprobe more often MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DERBY-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254368#comment-14254368 ] Mike Matrigali commented on DERBY-6784: --------------------------------------- some thoughts so far: 1) costing for the IN LIST of N terms has to compete with costing of N OR clauses. Need to see what row estimates are for that list of OR's in the marker case. Maybe the is some multiplication of percentages going on there. 2) Not sure how to factor this in, but in looking at various customer apps in this area I often found that when there were a lot of terms in the IN LIST, the application actually did not expect to match on many of them. Almost always the optimizer is going to assume at least 1 row rather than 0 rows for an equality constraint in a where clause. Not sure if this matters for this work, but something to think about. 3) In running the benchmark and looking at army's comments there is a wierdness with compile time going way up as terms are added in the literal case vs the marker case. Should see if system is going to store for each of the or clauses, because in the literal case the optimizer is going to ask the store for an estimate for scan matching that literal, and store is going to do some actual work on the index to determine the count. This works out well for queries with just a few clauses, but in an IN CLAUSE with huge number of clauses I think it results in the optmization doing almost as much work as the actual query (actually I think army found it was more than the query - which again is a costing issue as the optimizer is meant to only take as much time as it estimates the query will take). Need to create a new JIRA for this and add info as I find it in the debugging process. Would be interesting to run benchmark on unique id's rather than current FOREIGN_KEY_UUID which is not marked unique. > change optimizer to choose in list multiprobe more often > -------------------------------------------------------- > > Key: DERBY-6784 > URL: https://issues.apache.org/jira/browse/DERBY-6784 > Project: Derby > Issue Type: Improvement > Components: SQL > Affects Versions: 10.11.1.1 > Reporter: Mike Matrigali > Assignee: Mike Matrigali > Attachments: DERBY_6784_diff_1.txt, Derby47PerformanceTest.java, current_perf.txt, patch_1_perf.txt > > > Using the multi-probe join strategy is an obvious performance win when > the optimizer chooses it. There are cases currently where the costing > makes the optimizer choose other plans which do not perform as well as > the multi-probe strategy. > The class of queries that are affected are those where the number of terms > in the IN LIST is large relative to the number of rows in the table, and there > is a useful index to probe for the column that is referenced by the IN LIST. > There are multiple benefits to choosing the multi-probe strategy, including > the following: > 1) often better execution time, where the alternative is to do a full table > merge on the column. > 2) The multi-probe strategy results in "pushing" the work into the store, > and this may result in more concurrent behavior (see DERBY-6300 and DERBY-6301). First less rows may > be locked by probing rather than full table scan (and in the worst case > same number if query manages to probe on every value in table). > Second depending on isolation level of the query store will only matching > rows, while in the current implementation all rows that are returned by > store for qualification above store will remain locked whether they > qualify or not. Especially in small table cases other query plan choices > have been changed to favor probing indexes rather than full table scans > even if pure cpu is better with table scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)