spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vanzin <...@git.apache.org>
Subject [GitHub] spark pull request #15112: [RFC][SPARK-17549][sql] Only collect table size s...
Date Thu, 15 Sep 2016 18:16:22 GMT
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15112#discussion_r79028228
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
---
    @@ -74,21 +71,12 @@ case class InMemoryRelation(
       @transient val partitionStatistics = new PartitionStatistics(output)
     
       override lazy val statistics: Statistics = {
    -    if (batchStats.value.isEmpty) {
    +    if (batchStats.value == 0L) {
           // Underlying columnar RDD hasn't been materialized, no useful statistics information
           // available, return the default statistics.
           Statistics(sizeInBytes = child.sqlContext.conf.defaultSizeInBytes)
         } else {
    -      // Underlying columnar RDD has been materialized, required information has also
been
    -      // collected via the `batchStats` accumulator.
    -      val sizeOfRow: Expression =
    -        BindReferences.bindReference(
    -          output.map(a => partitionStatistics.forAttribute(a).sizeInBytes).reduce(Add),
    -          partitionStatistics.schema)
    -
    -      val sizeInBytes =
    -        batchStats.value.asScala.map(row => sizeOfRow.eval(row).asInstanceOf[Long]).sum
    -      Statistics(sizeInBytes = sizeInBytes)
    +      Statistics(sizeInBytes = batchStats.value.longValue)
    --- End diff --
    
    Given that I changed the stat and all tests still passed locally, I doubt we have one...
I'll take a look once I find some time to get back to this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message