drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5491) NPE when reading a CSV file, with headers, but blank header line
Date Tue, 09 May 2017 05:03:04 GMT
Paul Rogers created DRILL-5491:
----------------------------------

             Summary: NPE when reading a CSV file, with headers, but blank header line
                 Key: DRILL-5491
                 URL: https://issues.apache.org/jira/browse/DRILL-5491
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


See DRILL-5490 for background.

Try this unit test case:

{code}
    FixtureBuilder builder = ClusterFixture.builder()
        .maxParallelization(1);

    try (ClusterFixture cluster = builder.build();
         ClientFixture client = cluster.clientFixture()) {
      TextFormatConfig csvFormat = new TextFormatConfig();
      csvFormat.fieldDelimiter = ',';
      csvFormat.skipFirstLine = false;
      csvFormat.extractHeader = true;
      cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
      String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
      client.queryBuilder().sql(sql).printCsv();
    }
  }
{code}

The test can also be run as a query using your favorite client.

Using this input file:

{code}

a,b,c
d,e,f
{code}

(The first line is blank.)

The following is the result:

{code}
Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: NullPointerException
{code}

The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490)
to detect this case.

The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:

{code}
    String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
{code}

Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:

{code}
  public String [] getTextOutput () throws ExecutionSetupException {
    if (recordCount == 0 || fieldIndex == -1) {
      return null;
    }

    if (this.recordStart != characterData) {
      throw new ExecutionSetupException("record text was requested before finishing record");
    }
{code}

Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to
increment the {{recordCount}}.  (BTW: {{recordCount}} is the total across-batch count, probably
the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null.

But, if the author probably thought we'd get a zero-length record, and the if-statement throws
an exception in this case. But, see DRILL-5490 about why this code does not actually work.

The result is one bug (not incrementing the record count), triggering another (returning a
null), which masks a third ({{recordStart}} is not set correctly so the exception would not
be thrown.)

All that bad code is just fun and games until we get an NPE, however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message