drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5530) IntVectors are sometimes not zero-filled on allocation
Date Sun, 21 May 2017 18:49:04 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers updated DRILL-5530:
-------------------------------
    Summary: IntVectors are sometimes not zero-filled on allocation  (was: IntVectors are
sometimes no zero-filled on allocation)

> IntVectors are sometimes not zero-filled on allocation
> ------------------------------------------------------
>
>                 Key: DRILL-5530
>                 URL: https://issues.apache.org/jira/browse/DRILL-5530
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>
> Much of Drill's vector code is based on the assumption that vectors are initialized to
0s.
> For example, consider the "bit" (is-set) vector used for nullable types. The vector is
presumed initialized to 0 so that all fields are set to not-set (e.g. null) by default. For
example, to allocate a new vector the code (slightly edited) is:
> {code}
>   private void allocateBytes(final long size) {
>     final int curSize = (int) size;
>     clear();
>     data = allocator.buffer(curSize);
>     data.setZero(0, data.capacity()); // Zero whole vector
>     allocationSizeInBytes = curSize;
>   }
> {code}
> The code to reallocate (grow) the vector zeros the new slots:
> {code}
>   public void reAlloc() {
>     final long newAllocationSize = allocationSizeInBytes * 2L;
>     final int curSize = (int)newAllocationSize;
>     final DrillBuf newBuf = allocator.buffer(curSize);
>     newBuf.setZero(0, newBuf.capacity());
>     newBuf.setBytes(0, data, 0, data.capacity()); // Zero new part
>     data.release();
>     data = newBuf;
>     allocationSizeInBytes = curSize;
>   }
> {code}
> However, it turns out that other vectors omit the initial zero step. Consider the {{IntVector}}:
> {code}
>   private void allocateBytes(final long size) {
>     final int curSize = (int)size;
>     clear();
>     data = allocator.buffer(curSize);
>     // Notice no call here to zero the buffer
>     data.readerIndex(0);
>     allocationSizeInBytes = curSize;
>   }
> {code}
> Now things start to get strange. If the buffer is newly allocated by Netty from the OS,
it will contain zeros. Thus, most test cases will get a zero buffer.
> Further, the reallocation for ints *does* zero the new half of the buffer:
> {code}
>   public void reAlloc() {
>     final long newAllocationSize = allocationSizeInBytes * 2L;
>     final DrillBuf newBuf = allocator.buffer((int)newAllocationSize);
>     newBuf.setBytes(0, data, 0, data.capacity());
>     final int halfNewCapacity = newBuf.capacity() / 2;
>     newBuf.setZero(halfNewCapacity, halfNewCapacity); // Zero new bytes
>     newBuf.writerIndex(data.writerIndex());
>     data.release(1);
>     data = newBuf;
>     allocationSizeInBytes = (int)newAllocationSize;
>   }
> {code}
> The result is that, most times, the bytes of the vector will be zero. But, the first
allocation may be non-zero if Netty reuses an existing block from Netty's free list.
> If any code assumes that values default to zero, that code will fail -- but only for
the first few bytes and only if run long enough for the buffer to be a "recycled" "dirty"
one.
> This was found by a test that filled vectors with runs of 'X' values for one test, followed
by another test that allocated an int vector and inspected its contents.
> This asymmetry is just asking for bugs to appear. To make clear that only bit vectors
are zero filled:
> * Remove the {{setZero()}} call when reallocating vectors.
> * When assertions are on, fill the buffer with dummy data (such as 0b1010_1010) to flush
out any code that happens to depend on zero values.
> * Code that needs to "back-fill" values (such as when columns are missing in some rows
in a reader) should do the back-filling explicitly rather than assuming zeros.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message