Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DD6FD200BE3 for ; Thu, 22 Dec 2016 22:01:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DBD7C160B35; Thu, 22 Dec 2016 21:01:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5801F160B1B for ; Thu, 22 Dec 2016 22:01:59 +0100 (CET) Received: (qmail 44632 invoked by uid 500); 22 Dec 2016 21:01:58 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 44611 invoked by uid 99); 22 Dec 2016 21:01:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2016 21:01:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 77BE42C03DC for ; Thu, 22 Dec 2016 21:01:58 +0000 (UTC) Date: Thu, 22 Dec 2016 21:01:58 +0000 (UTC) From: "Chao Sun (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Dec 2016 21:02:00 -0000 [ https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15055: ---------------------------- Attachment: design-doc-nested-column-pruning.pdf > Column pruning for nested fields in Parquet > ------------------------------------------- > > Key: HIVE-15055 > URL: https://issues.apache.org/jira/browse/HIVE-15055 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer, Physical Optimizer, Serializers/Deserializers > Reporter: Chao Sun > Assignee: Chao Sun > Labels: performance > Attachments: design-doc-nested-column-pruning.pdf > > > Some columnar file formats such as Parquet store fields in struct type also column by column using encoding described in Google Dramel pager. It's very common in big data where data are stored in structs while queries only needs a subset of the the fields in the structs. However, presently Hive still needs to read the whole struct regardless whether all fields are selected. Therefore, pruning unwanted sub-fields in struct or nested fields at file reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)