drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector
Date Thu, 22 Jun 2017 05:07:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Paul Rogers updated DRILL-5602:
    Summary: Repeated List Vector fails to initialize the offset vector  (was: Vector corruption
when allocating a repeated, variable-width vector)

> Repeated List Vector fails to initialize the offset vector
> ----------------------------------------------------------
>                 Key: DRILL-5602
>                 URL: https://issues.apache.org/jira/browse/DRILL-5602
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the external
sort did not properly allocate its spill batch vectors, and instead allowed them to grow by
doubling. While fixing that issue, a new issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as described
in DRILL-5530: value vectors do not zero-fill the first allocation for a vector (though subsequent
reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first element
of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first value is
written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains the value
16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This requires growing
the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
>     offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
>     allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
>     ...
>     data = allocator.buffer(curSize);
>     ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly states that,
for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct memory, which
*does* zero-fill the buffer.

This message was sent by Atlassian JIRA

View raw message