drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jinfengni <...@git.apache.org>
Subject [GitHub] drill pull request #637: Drill 1950 : Parquet row group filter pushdown.
Date Thu, 03 Nov 2016 21:41:19 GMT
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/637#discussion_r86449453
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
    @@ -1000,6 +1053,81 @@ public long getColumnValueCount(SchemaPath column) {
     
       @Override
       public List<SchemaPath> getPartitionColumns() {
    -    return new ArrayList<>(columnTypeMap.keySet());
    +    return new ArrayList<>(partitionColTypeMap.keySet());
       }
    +
    +  public GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities,
    +      FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)
{
    +    if (fileSet.size() == 1 || ! (parquetTableMetadata instanceof Metadata.ParquetTableMetadata_v3))
{
    +      return null; // no pruning for 1 single parquet file or metadata is prior v3.
    +    }
    +
    +    final Set<SchemaPath> schemaPathsInExpr = filterExpr.accept(new ParquetRGFilterEvaluator.FieldReferenceFinder(),
null);
    +
    +    final List<RowGroupMetadata> qualifiedRGs = new ArrayList<>(parquetTableMetadata.getFiles().size());
    +    Set<String> qualifiedFileNames = Sets.newHashSet(); // HashSet keeps a fileName
unique.
    +
    +    ParquetFilterPredicate filterPredicate = null;
    +
    +    for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) {
    +      final ImplicitColumnExplorer columnExplorer = new ImplicitColumnExplorer(optionManager,
this.columns);
    +      Map<String, String> implicitColValues = columnExplorer.populateImplicitColumns(file.getPath(),
selectionRoot);
    +
    +      for (RowGroupMetadata rowGroup : file.getRowGroups()) {
    +        ParquetMetaStatCollector statCollector = new ParquetMetaStatCollector(
    +            parquetTableMetadata,
    +            rowGroup.getColumns(),
    +            implicitColValues);
    +
    +        Map<SchemaPath, ColumnStatistics> columnStatisticsMap = statCollector.collectColStat(schemaPathsInExpr);
    --- End diff --
    
    Right. Filter predicate should be build only once. It's inside the loop just we need the
column type information during filter expression materialization, for both regular columns
and implicit columns. 
    
    I put a check if (filterPredicate == null) inside the loop, so that filter predicate is
built only once. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message