spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CodingCat <...@git.apache.org>
Subject [GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...
Date Wed, 06 Dec 2017 23:26:20 GMT
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19810#discussion_r155392065
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
---
    @@ -193,38 +195,68 @@ case class InMemoryTableScanExec(
     
       private val inMemoryPartitionPruningEnabled = sqlContext.conf.inMemoryPartitionPruning
     
    +  private def doFilterCachedBatches(
    +      rdd: RDD[CachedBatch],
    +      partitionStatsSchema: Seq[AttributeReference]): RDD[CachedBatch] = {
    +    val schemaIndex = partitionStatsSchema.zipWithIndex
    +    rdd.mapPartitionsWithIndex {
    +      case (partitionIndex, cachedBatches) =>
    +        if (inMemoryPartitionPruningEnabled) {
    +          cachedBatches.filter { cachedBatch =>
    +            val partitionFilter = newPredicate(
    +              partitionFilters.reduceOption(And).getOrElse(Literal(true)),
    +              partitionStatsSchema)
    +            partitionFilter.initialize(partitionIndex)
    +            if (!partitionFilter.eval(cachedBatch.stats)) {
    --- End diff --
    
    @sadikovi this while loop is building CatchedBatch, it just decides what's the right time
to seal the building window of a CatchedBatch and start the next one....so, in any way, you
need to go through every all records in the partition, 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message