Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 059B11763E for ; Wed, 8 Apr 2015 03:11:14 +0000 (UTC) Received: (qmail 34470 invoked by uid 500); 8 Apr 2015 03:11:13 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 34156 invoked by uid 500); 8 Apr 2015 03:11:13 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 34142 invoked by uid 99); 8 Apr 2015 03:11:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2015 03:11:13 +0000 Date: Wed, 8 Apr 2015 03:11:13 +0000 (UTC) From: "Dong Chen (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-10252) Make PPD work for Parquet in row group level MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Dong Chen created HIVE-10252: -------------------------------- Summary: Make PPD work for Parquet in row group level Key: HIVE-10252 URL: https://issues.apache.org/jira/browse/HIVE-10252 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen In Hive, predicate pushdown figures out the search condition in HQL, serialize it, and push to file format. ORC could use the predicate to filter stripes. Similarly, Parquet should use the statics saved in row group to filter not match row group. But it does not work. In {{ParquetRecordReaderWrapper}}, it get splits with all row groups (client side), and push the filter to Parquet for further processing (parquet side). But in {{ParquetRecordReader.initializeInternalReader()}}, if the splits have already been selected by client side, it will not handle filter again. We should make the behavior consistent in Hive. Maybe we could get splits, filter them, and then pass to parquet. This means using client side strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)