drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Date Tue, 28 Feb 2017 00:06:45 GMT
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/761#discussion_r103333406
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
---
    @@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer
sizer) {
         long origInputBatchSize = estimatedInputBatchSize;
         estimatedInputBatchSize = Math.max(estimatedInputBatchSize, actualBatchSize);
     
    +    // The row width may end up as zero if all fields are nulls or some
    +    // other unusual situation. In this case, assume a width of 10 just
    +    // to avoid lots of special case code.
    +
    +    if (estimatedRowWidth == 0) {
    +      estimatedRowWidth = 10;
    --- End diff --
    
    This is a very peculiar case that came up in testing. It seems that we can have a row
with one column and that one column is always null. Imagine a Parquet file that has 1 million
Varchars, all of which are null. In every batch, the row width will be 0. Since we often divide
by the row width, bad things happen. So, here, we arbitrarily say that if the row is abnormally
small, just assume 10 bytes to avoid the need for a bunch of special case calcs. (The calcs
are already too complex already.)
    
    If there are 1000 columns, all of which are null, we would write 1000 "bit" (really byte)
vectors, so each row would be 1000 bytes wide. But, in such a case, the batch analyzer should
have come up with a number other than 0 for the row width.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message