drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ppadma <...@git.apache.org>
Subject [GitHub] drill pull request #749: DRILL-5266: Parquet returns low-density batches
Date Thu, 23 Feb 2017 14:46:45 GMT
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/749#discussion_r102726705
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenBinaryReader.java
---
    @@ -70,33 +70,31 @@ public long readFields(long recordsToReadInThisPass, ColumnReader<?>
firstColumn
         return recordsReadInCurrentPass;
       }
     
    -
       private long determineSizesSerial(long recordsToReadInThisPass) throws IOException
{
    -    int lengthVarFieldsInCurrentRecord = 0;
    -    boolean exitLengthDeterminingLoop = false;
    -    long totalVariableLengthData = 0;
    -    long recordsReadInCurrentPass = 0;
    -    do {
    +
    +    // Can't read any more records than fixed width fields will fit.
    +    // Note: this calculation is very likely wrong; it is a simplified
    +    // version of earlier code, but probably needs even more attention.
    +
    +    int totalFixedFieldWidth = parentReader.getBitWidthAllFixedFields() / 8;
    +    long batchSize = parentReader.getBatchSize();
    +    if (totalFixedFieldWidth > 0) {
    +      recordsToReadInThisPass = Math.min(recordsToReadInThisPass, batchSize / totalFixedFieldWidth);
    --- End diff --
    
    Instead of fixing it up here, please do it outside the function and pass the correct value
for recordsToReadInThisPass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message