Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6CB9B200C44 for ; Mon, 27 Mar 2017 17:17:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6B4CA160B99; Mon, 27 Mar 2017 15:17:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B1924160B5D for ; Mon, 27 Mar 2017 17:17:45 +0200 (CEST) Received: (qmail 70354 invoked by uid 500); 27 Mar 2017 15:17:44 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 70344 invoked by uid 99); 27 Mar 2017 15:17:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Mar 2017 15:17:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6A2A7C00B0 for ; Mon, 27 Mar 2017 15:17:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id OJmlsuA7R9fC for ; Mon, 27 Mar 2017 15:17:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id ABEBF5FB7A for ; Mon, 27 Mar 2017 15:17:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id F267DE059C for ; Mon, 27 Mar 2017 15:17:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9F37524065 for ; Mon, 27 Mar 2017 15:17:41 +0000 (UTC) Date: Mon, 27 Mar 2017 15:17:41 +0000 (UTC) From: "Harshvardhan Gupta (JIRA)" To: derby-dev@db.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 27 Mar 2017 15:17:46 -0000 [ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943436#comment-15943436 ] Harshvardhan Gupta commented on DERBY-6921: ------------------------------------------- Thanks for reviewing the proposal. I'll make sure to thoroughly go into the above mentioned resources and past GSoC proposals to further refine my proposal. > How good is the Derby Query Optimizer, really > --------------------------------------------- > > Key: DERBY-6921 > URL: https://issues.apache.org/jira/browse/DERBY-6921 > Project: Derby > Issue Type: Improvement > Components: SQL > Reporter: Bryan Pendleton > Priority: Minor > Labels: database, gsoc2017, java, optimizer > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich > Technical University introduced a new benchmark suite for evaluating > database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf > The benchmark test suite is publically available: > http://db.in.tum.de/people/sites/leis/qo/job.tgz > The data set for running the benchmark is publically available: > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > As part of Google Summer of Code 2017, I am volunteering to mentor > a Summer of Code intern who is interested in using these tools to > improve the Derby query optimizer. > My suggestion for the overall process is this: > 1) Acquire the benchmark tools, and the data set > 2) Run the benchmark. > 2a) Some of the benchmark queries may reveal bugs in Derby. > For each such bug, we need to isolate the bug and fix it. > 3) Once we are able to run the entire benchmark, we need to > analyze the results. > 3a) Some of the benchmark queries may reveal opportunities > for Derby to improve the query plans that it chooses for > various classes of queries (this is explained in detail in the > VLDB paper and other information available at Dr. Leis's site) > For each such improvement, we need to isolate the issue, > report it as a separable improvement, and fix it (if we can) > While the benchmark is an interesting exercise in and of itself, > the overall goal of the project is to find-and-fix problems in the > Derby query optimizer, specifically in the 3 areas which are > the focus of the benchmark tool: > 1) How good is the Derby cardinality estimator and when does > it lead to slow queries? > 2) How good it the Derby cost model, and how well is it guiding > the overall query optimization process? > 3) How large is the Derby enumerated plan space, and is it > appropriately-sized? > While other Derby issues have been filed against these questions > in the past, the intent of this specific project is to use the concrete > tools provided by the VLDB paper to make this effort rigorous and > successful at making concrete improvements to the Derby query > optimizer. > If you are interested in pursuing this project, please take these > considerations into mind: > 1) This is NOT an introductory project. You must be quite familiar > with DBMS systems, and with SQL, and in particular with > cost-based query optimization. If terms such as "cardinality > estimation", "correlated query predicates", or "bushy trees" > aren't comfortable terms for you ,this probably isn't the > project you're interested in. > 2) If you are new to Derby, that is fine, but please take advantage > of the extensive body of introductory material on Derby to > become familiar with it: read the Derby Getting Started manual, > download the software and follow the tutorials, read the documentation, > download the source code and learn how to build and run the > test suites, etc. > 3) All I have presented here is an **outline** of the project. You will > need to read the paper(s), study the benchmark queries, and > propose a detailed plan for how to use this benchmark as a tool > for improving the Derby query optimizer. > If these sorts of tasks sound like exciting things to do, then please > let us know! -- This message was sent by Atlassian JIRA (v6.3.15#6346)