drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5490) RepeatedVarCharOutput.recordStart becomes corrupted on realloc
Date Tue, 09 May 2017 02:42:04 GMT
Paul Rogers created DRILL-5490:

             Summary: RepeatedVarCharOutput.recordStart becomes corrupted on realloc
                 Key: DRILL-5490
                 URL: https://issues.apache.org/jira/browse/DRILL-5490
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor

The class {{RepeatedVarCharOutput}}, used to read text data into the {{columns}} array, has
a member called {{recordStart}} which is supposed to track the memory offset of the start
of the record data in the {{columns}} VarChar array. However, the logic for the member is
very wrong. First, it is initialized based on an uninitialized variable:

  public void startBatch() {
    this.recordStart = characterDataOriginal;

  private void loadVarCharDataAddress(){
    this.characterDataOriginal = buf.memoryAddress();

Second, the class keeps track of actual memory addresses for the VarChar data. When data would
exceed the buffer, memory is reallocated using the following method. But, this method *does
not* adjust the {{recordStart}} member, which now points to a bogus memory address:

  private void expandVarCharData(){
    long diff = characterData - characterDataOriginal;
    characterData += diff;

Fortunately, like so many bugs in this class, these bugs are offset by the fact that the variable
is never used in a way that cause obvious problems (else these issues would have been found
sooner.) The only real use is:

  public boolean rowHasData() {
    return recordStart < characterData;

Which, it turns out, is used only at EOF in {{TextReader.parseRecord()}}:

    } catch(StreamFinishedPseudoException e) {
      // if we've written part of a field or all of a field, we should send this row.
      if (fieldsWritten == 0 && !output.rowHasData()) {
        throw e;

Thus, the bug will cause a failure only if:

* A batch is resized, and
* The new address is lower than the original address, and
* It is the last batch in the file.

Because of the low probability of hitting the bug, the priority is set to Minor.

This message was sent by Atlassian JIRA

View raw message