drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben-Zvi <...@git.apache.org>
Subject [GitHub] drill pull request #822: DRILL-5457: Spill implementation for Hash Aggregate
Date Thu, 01 Jun 2017 22:35:10 GMT
Github user Ben-Zvi commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
    @@ -285,8 +648,18 @@ public AggOutcome doWork() {
           // In the future HashAggregate may also need to perform some actions conditionally
           // in the outer try block.
    +      assert ! handlingSpills || currentIndex < Integer.MAX_VALUE;
           while (true) {
    +        // This would be called only once - after actual data arrives on incoming
    +        if ( schema == null && incoming.getRecordCount() > 0 ) {
    --- End diff --
    There is no code for OK_FIRST_NON_EMPTY; and the local field "schema" is used here as
a flag to note "setup not yet performed" ( not always matched with OK_NEW_SCHEMA; sometimes
the second batch with an OK is the first non empty batch).
       And next() is a FINAL method (in AbstractRecordBatch), which in turn invokes other
next() methods of other classes extending RecordBatch (like the new SpilledRecordBatch). 
Should we put there the code to perform delayed setup for the HashAgg ?
      Even if the next() is modified to return a new flag like OK_FIRST_NON_EMPTY -- these
flags are checked in the code below, starting from the second batch and on. Not sure where
the code reading the first incoming batch is ....

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message