drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5490) RepeatedVarCharOutput.recordStart becomes corrupted on realloc
Date Tue, 09 May 2017 02:42:04 GMT
Paul Rogers created DRILL-5490:
----------------------------------

             Summary: RepeatedVarCharOutput.recordStart becomes corrupted on realloc
                 Key: DRILL-5490
                 URL: https://issues.apache.org/jira/browse/DRILL-5490
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor


The class {{RepeatedVarCharOutput}}, used to read text data into the {{columns}} array, has
a member called {{recordStart}} which is supposed to track the memory offset of the start
of the record data in the {{columns}} VarChar array. However, the logic for the member is
very wrong. First, it is initialized based on an uninitialized variable:

{{code}}
  public void startBatch() {
    this.recordStart = characterDataOriginal;
    ...
    loadVarCharDataAddress();
  }

  private void loadVarCharDataAddress(){
    ...
    this.characterDataOriginal = buf.memoryAddress();
    ...
  }
{{code}}

Second, the class keeps track of actual memory addresses for the VarChar data. When data would
exceed the buffer, memory is reallocated using the following method. But, this method *does
not* adjust the {{recordStart}} member, which now points to a bogus memory address:

{code}
  private void expandVarCharData(){
    vector.getDataVector().reAlloc();
    long diff = characterData - characterDataOriginal;
    loadVarCharDataAddress();
    characterData += diff;
  }
{code}

Fortunately, like so many bugs in this class, these bugs are offset by the fact that the variable
is never used in a way that cause obvious problems (else these issues would have been found
sooner.) The only real use is:

{code}
  @Override
  public boolean rowHasData() {
    return recordStart < characterData;
  }
{code}

Which, it turns out, is used only at EOF in {{TextReader.parseRecord()}}:

{code}
    } catch(StreamFinishedPseudoException e) {
      // if we've written part of a field or all of a field, we should send this row.
      if (fieldsWritten == 0 && !output.rowHasData()) {
        throw e;
      }
    }
{code}

Thus, the bug will cause a failure only if:

* A batch is resized, and
* The new address is lower than the original address, and
* It is the last batch in the file.

Because of the low probability of hitting the bug, the priority is set to Minor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message