spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dongjoon-hyun <...@git.apache.org>
Subject [GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...
Date Tue, 22 May 2018 16:03:37 GMT
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21295#discussion_r189960120
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
---
    @@ -147,7 +147,8 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont
         this.sparkSchema = StructType$.MODULE$.fromString(sparkRequestedSchemaString);
         this.reader = new ParquetFileReader(
             configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns());
    -    for (BlockMetaData block : blocks) {
    +    // use the blocks from the reader in case some do not match filters and will not
be read
    --- End diff --
    
    It looks correct to me, too. However, this comment isn't clear.
    - If the comment is correct only in Parquet 1.10.0, please fix the comment.
    - If the comment is correct, the failure should occur in Apache Spark 2.3.X (with old
Parquet). Why don't we fix that in 2.3.1? This is [my original suggestion](https://github.com/apache/spark/pull/21295#issuecomment-388656852).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message