Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5C9B1844E for ; Mon, 23 Nov 2015 22:37:04 +0000 (UTC) Received: (qmail 81831 invoked by uid 500); 23 Nov 2015 22:37:04 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 81772 invoked by uid 500); 23 Nov 2015 22:37:04 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 81760 invoked by uid 99); 23 Nov 2015 22:37:03 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2015 22:37:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7EA11C103F for ; Mon, 23 Nov 2015 22:37:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.979 X-Spam-Level: ** X-Spam-Status: No, score=2.979 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=usc-edu.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id hMiCeVa7Wi2y for ; Mon, 23 Nov 2015 22:37:02 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 6897020EBA for ; Mon, 23 Nov 2015 22:37:01 +0000 (UTC) Received: by wmww144 with SMTP id w144so2569930wmw.0 for ; Mon, 23 Nov 2015 14:37:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usc-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6sflFIonFDbckSBDqlKp2ECnGsqLAs6e8bVI1+QT1Ww=; b=O5juNdVfDHdw4dHz9QlAjSNiHxRLcfAXWsxoV5X5OIaLVd2tR0rrBGAJFrGnakSI/J OYCUwMhar5lsv+7bmjQKnlcnmBGz+nPRxjyvph0xUEqHAmV6q+z+GhqsVxwOPmLI+3wQ zsu5ENKVlTOQgJnfeo8dSx795dV6Sa6pEcpn4Nq0Ea6AU5dkbj6OviHQFg9glBC4Qpgz ip0OkjZAuU2NgWZj6NIgCk0yKzf9CY3oHIjcRrjVXUOxG57Kyt1fVJY0s/DqrCq8DZH2 R9pebbWW5VUBKQY9mPLC7jpow79lUjCyfI8jtM1veSK+W1AhTmdqi/uCv3pcP3WSYQix 3nag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=6sflFIonFDbckSBDqlKp2ECnGsqLAs6e8bVI1+QT1Ww=; b=DQzvGQ+vlaXpaIdJv2vFjLwNGVIpXv4g5MIlnklp2YGLD4QFMV3BvO06De400MNfWG vuiGL6dB90PP677OsKyIJN4WCImLeTQPlXhf27jrPZJQw4iiB9XkE9o4eec7FQCj3z1b silgGRo0ZdFKHHdxvACf1N8JjHerdeEefCw87YGLvUinaxaFGqBwEA9eAT19RESV1yDv 7gBLsrYVv7X5aHmu5LAx+m2FMIRbFLAUJeUFbbzgjJm0zbtThOIR+Ah4eglP3PI51YAV dNjhdMl85F8MkGBGLQ4PN1a1z4RZfuA6LwaKoTHqeAfBVYSsX7nNnPIEGHset6o8kdnU lzqQ== X-Gm-Message-State: ALoCoQmfZVMnm/QC/L3gwsgyP+nMUdPwh1zNpeButVihMFaq1134b2dljPZHOCVoDN6FZ2uxCeCI MIME-Version: 1.0 X-Received: by 10.28.171.134 with SMTP id u128mr19883947wme.22.1448318220864; Mon, 23 Nov 2015 14:37:00 -0800 (PST) Received: by 10.28.159.83 with HTTP; Mon, 23 Nov 2015 14:37:00 -0800 (PST) In-Reply-To: <56538BCD.1080901@gmail.com> References: <56535BF8.8010208@gmail.com> <56538BCD.1080901@gmail.com> Date: Mon, 23 Nov 2015 14:37:00 -0800 Message-ID: Subject: Re: Moving directory based pruning to fire earlier From: Sean Hsuan-Yi Chu To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=001a114527fc13a6e705253cdd31 --001a114527fc13a6e705253cdd31 Content-Type: text/plain; charset=UTF-8 Does that mean we would use hep planner to do directory pruning as the first stage of logical planning? I think it does make sense to allow the rules, which can definitely reduce the cost be fired before volcano. How about expression reduction? I believe sometimes pruning need the simplified expressions to proceed. On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid wrote: > Currently all rules based on Calcite logical rels and Drill logical rels > are put together and are fired together. As part of DRILL-3996, Jinfeng > will break it down into different phases. I should be able to take > advantage of this and move the directory based partition pruning to fire > based on Calcite rels. > > Thanks > Mehant > > > On 11/23/15 10:58 AM, Hanifi GUNES wrote: > >> The general idea of multi-phase pruning makes sense to me. I am wondering, >> though, are we referring to introducing a new planning phase before the >> logical or separating out the logic so as to make directory pruning kick >> off ahead of column partitioning? >> >> 2015-11-23 10:33 GMT-08:00 Mehant Baid : >> >> As part of DRILL-3996 >>> Jinfeng mentioned that he plans to move the directory based pruning rule >>> earlier than column based pruning. I want to expand on that a little, >>> provide the motivation and gather thoughts/ feedback. >>> >>> Currently both the directory based pruning and the column based pruning >>> is >>> fired in the same planning phase and are based on Drill logical rels. >>> This >>> is not optimal in the case where data is organized in such a way that >>> both >>> directory based pruning and column based pruning can be applied (when the >>> data is organized with a nested directory structure plus the individual >>> files contain partition columns). As part of creating the Drill logical >>> scan we read the footers of all the files involved. If the directory >>> based >>> pruning rule is fired earlier (rule to fire based on calcite logical >>> rels) >>> then we will be able to prune out unnecessary directories and save the >>> work >>> of reading the footers of these files. >>> >>> Thanks >>> Mehant >>> >>> >>> > --001a114527fc13a6e705253cdd31--