drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
Date Tue, 28 Feb 2017 00:06:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886813#comment-15886813
] 

ASF GitHub Bot commented on DRILL-5284:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/761#discussion_r103333406
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
---
    @@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer
sizer) {
         long origInputBatchSize = estimatedInputBatchSize;
         estimatedInputBatchSize = Math.max(estimatedInputBatchSize, actualBatchSize);
     
    +    // The row width may end up as zero if all fields are nulls or some
    +    // other unusual situation. In this case, assume a width of 10 just
    +    // to avoid lots of special case code.
    +
    +    if (estimatedRowWidth == 0) {
    +      estimatedRowWidth = 10;
    --- End diff --
    
    This is a very peculiar case that came up in testing. It seems that we can have a row
with one column and that one column is always null. Imagine a Parquet file that has 1 million
Varchars, all of which are null. In every batch, the row width will be 0. Since we often divide
by the row width, bad things happen. So, here, we arbitrarily say that if the row is abnormally
small, just assume 10 bytes to avoid the need for a bunch of special case calcs. (The calcs
are already too complex already.)
    
    If there are 1000 columns, all of which are null, we would write 1000 "bit" (really byte)
vectors, so each row would be 1000 bytes wide. But, in such a case, the batch analyzer should
have come up with a number other than 0 for the row width.


> Roll-up of final fixes for managed sort
> ---------------------------------------
>
>                 Key: DRILL-5284
>                 URL: https://issues.apache.org/jira/browse/DRILL-5284
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, extensive testing
has identified a number of minor fixes and improvements. Given the long PR cycles, it is not
practical to spend a week or two to do a PR for each fix individually. This ticket represents
a roll-up of a combination of a number of fixes. Small fixes are listed here, larger items
appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message