drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From parthchandra <...@git.apache.org>
Subject [GitHub] drill pull request #826: DRILL-5379: Set Hdfs Block Size based on Parquet Bl...
Date Wed, 17 May 2017 00:11:26 GMT
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/826#discussion_r116886162
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
---
    @@ -380,14 +384,21 @@ public void endRecord() throws IOException {
     
           // since ParquetFileWriter will overwrite empty output file (append is not supported)
           // we need to re-apply file permission
    -      parquetFileWriter = new ParquetFileWriter(conf, schema, path, ParquetFileWriter.Mode.OVERWRITE);
    +      if (useConfiguredBlockSize) {
    --- End diff --
    
    The API `ParquetFileWriter(conf, schema, path, ParquetFileWriter.Mode.OVERWRITE)` will
cause the Parquet file writer to set the file block size to the greater of the configured
files system block size or 128 MB (the ParquetWriter's row group size). 
    Drill's Parquet writer will use the block size specified in Drill's options to create
a new Parquet row group when the limit is reached (See `ParquetRecodWriter.checkBlockSizeReached()`
). If you set Drill's Parquet block size to the larger of the configured file system block
size or 128 MB, you will get the row group to match the HDFS block size. 
    Which is what the current code does.
    Isn't this what the original JIRA wanted?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message