From dev-return-47463-archive-asf-public=cust-asf.ponee.io@drill.apache.org Wed May 1 15:58:16 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EA711180629 for ; Wed, 1 May 2019 17:58:15 +0200 (CEST) Received: (qmail 34891 invoked by uid 500); 1 May 2019 15:58:14 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 34788 invoked by uid 99); 1 May 2019 15:58:14 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2019 15:58:14 +0000 From: GitBox To: dev@drill.apache.org Subject: [GitHub] [drill] amansinha100 commented on a change in pull request #1771: DRILL-7199: Optimize population of metadata for non-interesting columns Message-ID: <155672629429.29682.2966378668743562360.gitbox@gitbox.apache.org> Date: Wed, 01 May 2019 15:58:14 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit amansinha100 commented on a change in pull request #1771: DRILL-7199: Optimize population of metadata for non-interesting columns URL: https://github.com/apache/drill/pull/1771#discussion_r280106679 ########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetTableMetadataUtils.java ########## @@ -286,25 +287,27 @@ public static PartitionMetadata getPartitionMetadata(SchemaPath partitionColumn, statistics.put(ColumnStatisticsKind.NULLS_COUNT, nulls); columnsStatistics.put(colPath, new ColumnStatisticsImpl(statistics, comparator)); } - columnsStatistics.putAll(populateNonInterestingColumnsStats(columnsStatistics.keySet(), tableMetadata)); return columnsStatistics; } /** - * Populates the non-interesting column's statistics - * @param schemaPaths columns paths which should be ignored + * Returns the non-interesting column's metadata * @param parquetTableMetadata the source of column metadata for non-interesting column's statistics - * @return returns non-interesting column statistics map + * @return returns non-interesting columns metadata */ - @SuppressWarnings("unchecked") - public static Map populateNonInterestingColumnsStats( - Set schemaPaths, MetadataBase.ParquetTableMetadataBase parquetTableMetadata) { + public static NonInterestingColumnsMetadata getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata) { Map columnsStatistics = new HashMap<>(); if (parquetTableMetadata instanceof Metadata_V4.ParquetTableMetadata_v4) { - for (Metadata_V4.ColumnTypeMetadata_v4 columnTypeMetadata : - ((Metadata_V4.ParquetTableMetadata_v4) parquetTableMetadata).getColumnTypeInfoMap().values()) { - SchemaPath schemaPath = SchemaPath.getCompoundPath(columnTypeMetadata.name); - if (!schemaPaths.contains(schemaPath)) { + ConcurrentHashMap columnTypeInfoMap = + ((Metadata_V4.ParquetTableMetadata_v4) parquetTableMetadata).getColumnTypeInfoMap(); + + if (columnTypeInfoMap == null) { + return new NonInterestingColumnsMetadata(columnsStatistics); + } // in some cases for runtime pruning Review comment: It's not clear what this comment means .. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services