drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jinfengni <...@git.apache.org>
Subject [GitHub] drill pull request: DRILL-3735: For partition pruning divide up th...
Date Tue, 15 Sep 2015 00:28:40 GMT
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/156#discussion_r39463605
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
---
    @@ -176,81 +177,103 @@ protected void doOnMatch(RelOptRuleCall call, DrillFilterRel filterRel,
DrillPro
         RexNode pruneCondition = c.getFinalCondition();
     
         if (pruneCondition == null) {
    +      logger.debug("No conditions were found eligible for partition pruning.");
           return;
         }
     
     
         // set up the partitions
    -    final GroupScan groupScan = scanRel.getGroupScan();
    -    List<PartitionLocation> partitions = descriptor.getPartitions();
    -
    -    if (partitions.size() > Character.MAX_VALUE) {
    -      return;
    -    }
    -
    -    final NullableBitVector output = new NullableBitVector(MaterializedField.create("",
Types.optional(MinorType.BIT)), allocator);
    -    final VectorContainer container = new VectorContainer();
    -
    -    try {
    -      final ValueVector[] vectors = new ValueVector[descriptor.getMaxHierarchyLevel()];
    -      for (int partitionColumnIndex : BitSets.toIter(partitionColumnBitSet)) {
    -        SchemaPath column = SchemaPath.getSimplePath(fieldNameMap.get(partitionColumnIndex));
    -        MajorType type = descriptor.getVectorType(column, settings);
    -        MaterializedField field = MaterializedField.create(column, type);
    -        ValueVector v = TypeHelper.getNewVector(field, allocator);
    -        v.allocateNew();
    -        vectors[partitionColumnIndex] = v;
    -        container.add(v);
    -      }
    -
    -      // populate partition vectors.
    -      descriptor.populatePartitionVectors(vectors, partitions, partitionColumnBitSet,
fieldNameMap);
    -
    -      // materialize the expression
    -      logger.debug("Attempting to prune {}", pruneCondition);
    -      final LogicalExpression expr = DrillOptiq.toDrill(new DrillParseContext(settings),
scanRel, pruneCondition);
    -      final ErrorCollectorImpl errors = new ErrorCollectorImpl();
    -
    -      LogicalExpression materializedExpr = ExpressionTreeMaterializer.materialize(expr,
container, errors, optimizerContext.getFunctionRegistry());
    -      // Make sure pruneCondition's materialized expression is always of BitType, so
that
    -      // it's same as the type of output vector.
    -      if (materializedExpr.getMajorType().getMode() == TypeProtos.DataMode.REQUIRED)
{
    -        materializedExpr = ExpressionTreeMaterializer.convertToNullableType(
    -            materializedExpr,
    -            materializedExpr.getMajorType().getMinorType(),
    -            optimizerContext.getFunctionRegistry(),
    -            errors);
    +    List<String> newFiles = Lists.newArrayList();
    +    long numTotal = 0; // total number of partitions
    +    int batchIndex = 0;
    +    String firstLocation = null;
    +
    +    // Outer loop: iterate over a list of batches of PartitionLocations
    +    for (List<PartitionLocation> partitions : descriptor) {
    +      numTotal += partitions.size();
    +      logger.debug("Evaluating partition pruning for batch {}", batchIndex);
    +      if (batchIndex == 0) { // save the first location in case everything is pruned
    +        firstLocation = partitions.get(0).getEntirePartitionLocation();
           }
    +      final NullableBitVector output = new NullableBitVector(MaterializedField.create("",
Types.optional(MinorType.BIT)), allocator);
    +      final VectorContainer container = new VectorContainer();
    +
    +      try {
    +        final ValueVector[] vectors = new ValueVector[descriptor.getMaxHierarchyLevel()];
    +          for (int partitionColumnIndex : BitSets.toIter(partitionColumnBitSet)) {
    +          SchemaPath column = SchemaPath.getSimplePath(fieldNameMap.get(partitionColumnIndex));
    +          MajorType type = descriptor.getVectorType(column, settings);
    +          MaterializedField field = MaterializedField.create(column, type);
    +          ValueVector v = TypeHelper.getNewVector(field, allocator);
    +          v.allocateNew();
    +          vectors[partitionColumnIndex] = v;
    +          container.add(v);
    +        }
     
    -      if (errors.getErrorCount() != 0) {
    -        logger.warn("Failure while materializing expression [{}].  Errors: {}", expr,
errors);
    -      }
    +        // populate partition vectors.
    +        descriptor.populatePartitionVectors(vectors, partitions, partitionColumnBitSet,
fieldNameMap);
    +
    +        // materialize the expression
    +        logger.debug("Attempting to prune {}", pruneCondition);
    +        final LogicalExpression expr = DrillOptiq.toDrill(new DrillParseContext(settings),
scanRel, pruneCondition);
    +        final ErrorCollectorImpl errors = new ErrorCollectorImpl();
    +
    +        LogicalExpression materializedExpr = ExpressionTreeMaterializer.materialize(expr,
container, errors, optimizerContext.getFunctionRegistry());
    +        // Make sure pruneCondition's materialized expression is always of BitType, so
that
    +        // it's same as the type of output vector.
    +        if (materializedExpr.getMajorType().getMode() == TypeProtos.DataMode.REQUIRED)
{
    +          materializedExpr = ExpressionTreeMaterializer.convertToNullableType(
    +              materializedExpr,
    +              materializedExpr.getMajorType().getMinorType(),
    +              optimizerContext.getFunctionRegistry(),
    +              errors);
    +        }
     
    -      output.allocateNew(partitions.size());
    -      InterpreterEvaluator.evaluate(partitions.size(), optimizerContext, container, output,
materializedExpr);
    -      int record = 0;
    +        if (errors.getErrorCount() != 0) {
    --- End diff --
    
    If expression materializer reports error, is it better to stop the execution of partition
pruning rule, and raise Exception here? In such case, I feel it's likely that the Interperter
would hit error as well.
    
    Also, is the condition expression same across multiple sub-list of partition locations?
If that's the case, is it better to move the logic of expression materialization out of this
for loop? We do not have to do materialization every time when we process one sublist.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message