drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #822: DRILL-5457: Spill implementation for Hash Aggregate
Date Sat, 27 May 2017 05:38:06 GMT
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/822#discussion_r118811987
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
---
    @@ -204,24 +293,157 @@ private int getNumPendingOutput() {
     
         @RuntimeOverridden
         public void setupInterior(@Named("incoming") RecordBatch incoming, @Named("outgoing")
RecordBatch outgoing,
    -        @Named("aggrValuesContainer") VectorContainer aggrValuesContainer) {
    +        @Named("aggrValuesContainer") VectorContainer aggrValuesContainer) throws SchemaChangeException
{
         }
     
         @RuntimeOverridden
    -    public void updateAggrValuesInternal(@Named("incomingRowIdx") int incomingRowIdx,
@Named("htRowIdx") int htRowIdx) {
    +    public void updateAggrValuesInternal(@Named("incomingRowIdx") int incomingRowIdx,
@Named("htRowIdx") int htRowIdx) throws SchemaChangeException{
         }
     
         @RuntimeOverridden
    -    public void outputRecordValues(@Named("htRowIdx") int htRowIdx, @Named("outRowIdx")
int outRowIdx) {
    +    public void outputRecordValues(@Named("htRowIdx") int htRowIdx, @Named("outRowIdx")
int outRowIdx) throws SchemaChangeException{
         }
       }
     
    +  /**
    +   * An internal class to replace "incoming" - instead scanning a spilled partition file
    +   */
    +  public class SpilledRecordbatch implements CloseableRecordBatch {
    --- End diff --
    
    This class extends the record batch interface. That interface is *VERY* confusing. It
sounds like it is just a "bundle of vectors" that holds records. But, it is actually the definition
of the Drill Volcano-like iterator protocol: it defines the methods needed to use your Spilled
Record Batch class as an operator. Since this is not an operator, you don't need to extend
that class.
    
    In fact, it is not clear you even need a superclass. To hold the vectors this class has
a container member. This class does not need most of the vector-access methods as this is
not an operator; any that are needed can be called on the container itself.
    
    Clearly a spilled batch need not follow the `next()` iterator protocol.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message