Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 42B54182E1 for ; Mon, 23 Nov 2015 18:33:22 +0000 (UTC) Received: (qmail 3296 invoked by uid 500); 23 Nov 2015 18:33:22 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 3238 invoked by uid 500); 23 Nov 2015 18:33:22 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 3225 invoked by uid 99); 23 Nov 2015 18:33:21 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2015 18:33:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 24AE8180A17 for ; Mon, 23 Nov 2015 18:33:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.898 X-Spam-Level: ** X-Spam-Status: No, score=2.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id RWNInjgV4-GX for ; Mon, 23 Nov 2015 18:33:20 +0000 (UTC) Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 3F54620697 for ; Mon, 23 Nov 2015 18:33:20 +0000 (UTC) Received: by pacdm15 with SMTP id dm15so198901059pac.3 for ; Mon, 23 Nov 2015 10:33:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:subject:to:message-id:date:user-agent:mime-version :content-type; bh=B0jzoKZCXl4yp2bTydhxZ4BqTKUuz4Cv4YTgYl8qx84=; b=JRZCiYZ6muTIDdUKxjNk+GJ49mpQVJ6Cq5RH6d95OwPx2ORnFopiTcK1odE8IB03m9 3FciIo7sN0a5BCNJnUU0BTMIJejq9ZUvL5Nw8vsv6YR70Nq/LV4LWsXXi39YREmKCz2d vFazV2WxzrMn2/xxold/7iZ8dGTGaHyfVpSdYwqNX+idbHirTdU/J1up1KyCs+YrxJWQ bIRGQu3Du6Tf+yTuK35A4LLgNhYM0dKESZRuMVqZYyclcjWptG0jqWzbKIBVRTCZthaO 0JSPGr2FI/gsnDXeXoU9hiK3WFvhDQMbpVkSvVVrwIAlY2ToPD2E6+6xjUalrUdGQxDs F+LA== X-Received: by 10.98.68.210 with SMTP id m79mr17747149pfi.140.1448303599964; Mon, 23 Nov 2015 10:33:19 -0800 (PST) Received: from Administrators-MacBook-Pro-111.local ([2601:646:8700:c895:21d7:4092:e92:4126]) by smtp.googlemail.com with ESMTPSA id n88sm10872749pfb.26.2015.11.23.10.33.19 for (version=TLSv1/SSLv3 cipher=OTHER); Mon, 23 Nov 2015 10:33:19 -0800 (PST) From: Mehant Baid Subject: Moving directory based pruning to fire earlier To: dev@drill.apache.org Message-ID: <56535BF8.8010208@gmail.com> Date: Mon, 23 Nov 2015 10:33:28 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="------------040400040604030304030505" --------------040400040604030304030505 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit As part of DRILL-3996 Jinfeng mentioned that he plans to move the directory based pruning rule earlier than column based pruning. I want to expand on that a little, provide the motivation and gather thoughts/ feedback. Currently both the directory based pruning and the column based pruning is fired in the same planning phase and are based on Drill logical rels. This is not optimal in the case where data is organized in such a way that both directory based pruning and column based pruning can be applied (when the data is organized with a nested directory structure plus the individual files contain partition columns). As part of creating the Drill logical scan we read the footers of all the files involved. If the directory based pruning rule is fired earlier (rule to fire based on calcite logical rels) then we will be able to prune out unnecessary directories and save the work of reading the footers of these files. Thanks Mehant --------------040400040604030304030505--