drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boaz Ben-Zvi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5728) Hash Aggregate: Useless bigint value vector in the values batch
Date Thu, 17 Aug 2017 23:33:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131478#comment-16131478
] 

Boaz Ben-Zvi commented on DRILL-5728:
-------------------------------------

Similar code is used when the underlying value column is nullable (see below). In this case
the additional value vector may be needed, but maybe can be replaced by a bitset instead of
bigint vector to save memory.

{code}
        public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
            throws SchemaChangeException
        {
            {
                NullableBigIntHolder out11 = new NullableBigIntHolder();
                {
                    out11 .isSet = vv8 .getAccessor().isSet((incomingRowIdx));
                    if (out11 .isSet == 1) {
                        out11 .value = vv8 .getAccessor().get((incomingRowIdx));
                    }
                }
                NullableBigIntHolder in = out11;
                work0 .value = vv1 .getAccessor().get((htRowIdx));
                BigIntHolder value = work0;
                work4 .value = vv5 .getAccessor().get((htRowIdx));
                BigIntHolder nonNullCount = work4;
                 
SumFunctions$NullableBigIntSum_add: {
    sout:
    {
        if (in.isSet == 0) {
            break sout;
        }
        nonNullCount.value = 1;
        value.value += in.value;
    }
}
 
                work0 = value;
                vv1 .getMutator().set((htRowIdx), work0 .value);
                work4 = nonNullCount;
                vv5 .getMutator().set((htRowIdx), work4 .value);
            }
        }
{code}


> Hash Aggregate: Useless bigint value vector in the values batch
> ---------------------------------------------------------------
>
>                 Key: DRILL-5728
>                 URL: https://issues.apache.org/jira/browse/DRILL-5728
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Codegen
>    Affects Versions: 1.11.0
>            Reporter: Boaz Ben-Zvi
>            Priority: Minor
>
>  When aggregating a non-nullable column (like *sum(l_partkey)* below), the code generation
creates an extra value vector (in addition to the actual "sum" vector) which is used as a
"nonNullCount".
>    This is useless (as the underlying column is non-nullable), and wastes considerable
memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* is only
used to hold a *1* flag to note "not null":
> {code}
>         public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
>             throws SchemaChangeException
>         {
>             {
>                 IntHolder out11 = new IntHolder();
>                 {
>                     out11 .value = vv8 .getAccessor().get((incomingRowIdx));
>                 }
>                 IntHolder in = out11;
>                 work0 .value = vv1 .getAccessor().get((htRowIdx));
>                 BigIntHolder value = work0;
>                 work4 .value = vv5 .getAccessor().get((htRowIdx));
>                 BigIntHolder nonNullCount = work4;
>                  
> SumFunctions$IntSum_add: {
>     nonNullCount.value = 1;
>     value.value += in.value;
> }
>  
>                 work0 = value;
>                 vv1 .getMutator().set((htRowIdx), work0 .value);
>                 work4 = nonNullCount;
>                 vv5 .getMutator().set((htRowIdx), work4 .value);
>             }
>         }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message