drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5491) NPE when reading a CSV file, with headers, but blank header line
Date Tue, 09 May 2017 05:03:04 GMT
Paul Rogers created DRILL-5491:

             Summary: NPE when reading a CSV file, with headers, but blank header line
                 Key: DRILL-5491
                 URL: https://issues.apache.org/jira/browse/DRILL-5491
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers

See DRILL-5490 for background.

Try this unit test case:

    FixtureBuilder builder = ClusterFixture.builder()

    try (ClusterFixture cluster = builder.build();
         ClientFixture client = cluster.clientFixture()) {
      TextFormatConfig csvFormat = new TextFormatConfig();
      csvFormat.fieldDelimiter = ',';
      csvFormat.skipFirstLine = false;
      csvFormat.extractHeader = true;
      cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
      String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";

The test can also be run as a query using your favorite client.

Using this input file:



(The first line is blank.)

The following is the result:

Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: NullPointerException

The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490)
to detect this case.

The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:

    String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();

Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:

  public String [] getTextOutput () throws ExecutionSetupException {
    if (recordCount == 0 || fieldIndex == -1) {
      return null;

    if (this.recordStart != characterData) {
      throw new ExecutionSetupException("record text was requested before finishing record");

Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to
increment the {{recordCount}}.  (BTW: {{recordCount}} is the total across-batch count, probably
the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null.

But, if the author probably thought we'd get a zero-length record, and the if-statement throws
an exception in this case. But, see DRILL-5490 about why this code does not actually work.

The result is one bug (not incrementing the record count), triggering another (returning a
null), which masks a third ({{recordStart}} is not set correctly so the exception would not
be thrown.)

All that bad code is just fun and games until we get an NPE, however.

This message was sent by Atlassian JIRA

View raw message