drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5598) AllocationHelper.allocateNew ignores maps, arrays
Date Tue, 20 Jun 2017 02:23:00 GMT
Paul Rogers created DRILL-5598:

             Summary: AllocationHelper.allocateNew ignores maps, arrays
                 Key: DRILL-5598
                 URL: https://issues.apache.org/jira/browse/DRILL-5598
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.11.0

The method {{VectorAccessibleUtilities.allocateVectors()}} is used to allocate vectors when
the external sort creates a spill batch. (Along with various other places.)

This method does not allocate space for repeated vectors or vectors contained in maps, resulting
in vectors starting life with a very short size. This cases repeated doublings as data is
loaded into the vectors:

BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [32768] -> [65536]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]

Maps can be handled by iterating over the contained vectors. Arrays and VarChars are harder
as the code needs some hint about data size. We have hard-coded hints available (the assumption
that VarChar columns are 50 characters wide, and that arrays have 10 elements.) Better would
be to pass in metadata about sizes extracted from previously-seen batches in the same operator
that allocates a new batch.

This message was sent by Atlassian JIRA

View raw message