Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0C5E18F86 for ; Thu, 18 Feb 2016 19:14:31 +0000 (UTC) Received: (qmail 27231 invoked by uid 500); 18 Feb 2016 19:14:18 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 27108 invoked by uid 500); 18 Feb 2016 19:14:18 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 26839 invoked by uid 99); 18 Feb 2016 19:14:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Feb 2016 19:14:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6FD4A2C1F71 for ; Thu, 18 Feb 2016 19:14:18 +0000 (UTC) Date: Thu, 18 Feb 2016 19:14:18 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152896#comment-15152896 ] ASF GitHub Bot commented on DRILL-4387: --------------------------------------- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53366236 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS newColumns.add(column); } } - if (newColumns.isEmpty()) { --- End diff -- @amansinha100 , I made slightly change to the patch to address the comments. Could you please take another look? Thanks! > Improve execution side when it handles skipAll query > ---------------------------------------------------- > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution side when they handles skipAll query. However, it seems there are other places in the codebase that do not handle skipAll query efficiently. In particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty column list with star column. This essentially will force the execution side (RecordReader) to fetch all the columns for data source. Such behavior will lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a follow-up work after DRILL-4279. > One simple example of this problem is: > {code} > SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the column list. In case table has dozens or hundreds of columns, this will make SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)