Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 46DD5200C3C for ; Mon, 3 Apr 2017 21:11:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 45970160B8F; Mon, 3 Apr 2017 19:11:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 935E8160B8D for ; Mon, 3 Apr 2017 21:11:45 +0200 (CEST) Received: (qmail 89690 invoked by uid 500); 3 Apr 2017 19:11:44 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 89510 invoked by uid 99); 3 Apr 2017 19:11:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Apr 2017 19:11:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B9A211A00D2 for ; Mon, 3 Apr 2017 19:11:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id GC7fGR2Tsefn for ; Mon, 3 Apr 2017 19:11:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BDA7A5FC62 for ; Mon, 3 Apr 2017 19:11:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3AB44E0BD2 for ; Mon, 3 Apr 2017 19:11:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A68D12401E for ; Mon, 3 Apr 2017 19:11:41 +0000 (UTC) Date: Mon, 3 Apr 2017 19:11:41 +0000 (UTC) From: "Aihua Xu (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-16291) Hive fails when unions a parquet table with itself MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 03 Apr 2017 19:11:46 -0000 [ https://issues.apache.org/jira/browse/HIVE-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954028#comment-15954028 ] Aihua Xu commented on HIVE-16291: --------------------------------- [~ashutoshc] The problem is not to set READ_ALL_COLUMNS to false, but when ids is empty and the old = "0", the newConfStr will become ",0". if (old != null && !old.isEmpty()) { newConfStr = newConfStr + StringUtils.COMMA_STR + old; } So when id is empty, we just need to set READ_ALL_COLUMNS to false. > Hive fails when unions a parquet table with itself > -------------------------------------------------- > > Key: HIVE-16291 > URL: https://issues.apache.org/jira/browse/HIVE-16291 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Yibing Shi > Assignee: Yibing Shi > Attachments: HIVE-16291.1.patch > > > Reproduce commands: > {code:sql} > create table tst_unin (col1 int) partitioned by (p_tdate int) stored as parquet; > insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310); > insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410); > select count(*) from (select tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302 union all select tst_unin.p_tdate from tst_unin) t1; > {code} > The table is stored in Parquet format, which is a columnar file format. Hive tries to push the query predicates to the table scan operators so that only the needed columns are read. This is done by adding the needed column IDs into job configuration with property "hive.io.file.readcolumn.ids". > In above case, the query unions the result of 2 subqueries, which select data from one same table. The first subquery doesn't need any column from Parquet file, while the second subquery needs a column "col1". Hive has a bug here, it finally set "hive.io.file.readcolumn.ids" to a value like "0,,0", which method ColumnProjectionUtils.getReadColumnIDs cannot parse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)