drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6126) Allocate memory for value vectors upfront in flatten operator
Date Thu, 22 Feb 2018 06:00:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372445#comment-16372445
] 

ASF GitHub Bot commented on DRILL-6126:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169860846
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
---
    @@ -266,12 +270,18 @@ public void addComplexWriter(ComplexWriter writer) {
         complexWriters.add(writer);
       }
     
    -  private boolean doAlloc() {
    -    //Allocate vv in the allocationVectors.
    +  private boolean doAlloc(int recordCount) {
    +
    +    //Allocate v in the allocationVectors.
         for (ValueVector v : this.allocationVectors) {
    -      if (!v.allocateNewSafe()) {
    -        return false;
    -      }
    +      // build vector initializer for the column.
    +      // This will iteratively include all nested columns underneath.
    +      RecordBatchSizer.ColumnSize colSize = flattenMemoryManager.getColumnSize(v.getField().getName());
    +      VectorInitializer initializer = new VectorInitializer();
    +      colSize.buildVectorInitializer(initializer);
    +      // Allocate memory for the vector. If it is map, it will allocate memory
    +      // for all nested child columns as well.
    +      initializer.allocateVector(v, "", recordCount);
    --- End diff --
    
    While this code can be made to work, it does introduce a more complex path than was intended.
The idea is that a `VectorInirializer` is a recursive structure. The top-level one holds hints
for the row. Nested instances can hold data for maps.
    
    Because this class was meant to be temporary, it holds "hints": the information is used
if available, defaults used if the hints are not available.
    
    So, a better approach would be to assemble a `VectorInitializer` for the output row in
one step. Then, apply it to the entire row in another step.
    
    Further if we do that, we don't have to create initializer objects for every vector allocation;
we can reuse the same set if the output row sizes don't change.
    
    Further, the code will be easier to reason about since we won't have two distinct paths.


> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
>                 Key: DRILL-6126
>                 URL: https://issues.apache.org/jira/browse/DRILL-6126
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out row count
in the output batch based on memory. Since we know how many rows we are going to include in
the batch, we can also allocate the memory needed upfront instead of starting with initial
value (4096) and doubling, copying every time we need more. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message