Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 95A5F10205 for ; Mon, 12 Jan 2015 22:11:33 +0000 (UTC) Received: (qmail 21810 invoked by uid 500); 12 Jan 2015 22:11:35 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 21782 invoked by uid 500); 12 Jan 2015 22:11:35 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 21772 invoked by uid 99); 12 Jan 2015 22:11:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Jan 2015 22:11:35 +0000 Date: Mon, 12 Jan 2015 22:11:35 +0000 (UTC) From: "Jinfeng Ni (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-1500) Partition filtering might lead to an unnecessary column in the result set. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274265#comment-14274265 ] Jinfeng Ni commented on DRILL-1500: ----------------------------------- +1. The patch looks good to me. The issue seems to be introduced in the flatten operator work. I'm wondering how we could prevent such ProjectPrel replacement in any future PrelVisitor. One idea is to make the ProjectAllowDupPrel constructor private, and only publicly expose copy() method. Also, add a public static method to explicitly create new instance of this special type of ProjectPrel. This might help prevent similar issue happening in the future. > Partition filtering might lead to an unnecessary column in the result set. > --------------------------------------------------------------------------- > > Key: DRILL-1500 > URL: https://issues.apache.org/jira/browse/DRILL-1500 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Jinfeng Ni > Assignee: Aman Sinha > Priority: Critical > Fix For: 0.8.0 > > Attachments: 0001-DRILL-1500-Partial-fix-Don-t-overwrite-top-level-Pro.patch > > > When partition filtering is used together with select * query, Drill might return the partitioning column duplicately. > Q1 : > {code} > select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` where dir0=1994 and dir1='Q1' order by dir0 limit 1; > +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+ > | dir00 | dir0 | dir1 | o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice | > +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+ > | 1994 | 1994 | Q1 | Clerk#000000743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66 | > +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+ > 1 row selected (2.097 seconds) > {code} > We can see that column "dir0" appeared twice in the result set. In comparison, here is the query without partition filtering and the query result: > Q2: > {code} > select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` order by dir0 limit 1; > +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+ > | dir0 | dir1 | o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice | > +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+ > | 1994 | Q1 | Clerk#000000743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66 | > +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+ > 1 row selected (0.761 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)