drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5457) Support Spill to Disk for the Hash Aggregate Operator
Date Thu, 01 Jun 2017 22:36:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033833#comment-16033833

ASF GitHub Bot commented on DRILL-5457:

Github user Ben-Zvi commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
    @@ -285,8 +648,18 @@ public AggOutcome doWork() {
           // In the future HashAggregate may also need to perform some actions conditionally
           // in the outer try block.
    +      assert ! handlingSpills || currentIndex < Integer.MAX_VALUE;
           while (true) {
    +        // This would be called only once - after actual data arrives on incoming
    +        if ( schema == null && incoming.getRecordCount() > 0 ) {
    --- End diff --
    There is no code for OK_FIRST_NON_EMPTY; and the local field "schema" is used here as
a flag to note "setup not yet performed" ( not always matched with OK_NEW_SCHEMA; sometimes
the second batch with an OK is the first non empty batch).
       And next() is a FINAL method (in AbstractRecordBatch), which in turn invokes other
next() methods of other classes extending RecordBatch (like the new SpilledRecordBatch). 
Should we put there the code to perform delayed setup for the HashAgg ?
      Even if the next() is modified to return a new flag like OK_FIRST_NON_EMPTY -- these
flags are checked in the code below, starting from the second batch and on. Not sure where
the code reading the first incoming batch is ....

> Support Spill to Disk for the Hash Aggregate Operator
> -----------------------------------------------------
>                 Key: DRILL-5457
>                 URL: https://issues.apache.org/jira/browse/DRILL-5457
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>             Fix For: 1.11.0
> Support gradual spilling memory to disk as the available memory gets too small to allow
in memory work for the Hash Aggregate Operator.

This message was sent by Atlassian JIRA

View raw message