drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Barclay (Drill) (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
Date Sun, 01 Nov 2015 22:10:28 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410
] 

Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 10:10 PM:
-------------------------------------------------------------------------

Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a
source having zero rows (so downstream operators didn't get its schema, even for static-schema
sources, or even get trigger to update their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented
clearly (so developers didn't know correctly what to expect or provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}}
(so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so
it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly
in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change
state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed
in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing
schema changes and downstream exceptions). \[Note:  DRILL-2288 does not address other problems
with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with multi-region HBase
tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.)   \[Note:
DRILL-2288 addresses only one test table (increasing the number of regions on the other test
tables exposed at least one other problem; others remain).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}}
dummy columns got created, causing spurious schema changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}},
{{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code
tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}}
(with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}}
(so, again, downstream code tried to access elements of (correctly) empty vectors, yielding
{{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length
and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}}
(with ~"... {{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case of a zero-record
record batch (so when it read a zero-row record batch, it caused a memory leak reported at
Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited identifiers
of a form (with a period) that Drill can't handle (so the test failed when the test ran with
multiple fragments).





was (Author: dsbos):

Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a
source having zero rows (so downstream operators didn't get its schema, even for static-schema
sources, or even get trigger to update their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented
clearly (so developers didn't know correctly what to expect or provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}}
(so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so
it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly
in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change
state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed
in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing
schema changes and downstream exceptions). \[Note:  DRILL-2288 does not address other problems
with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with multi-region HBase
tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.)   \[Note:
DRILL-2288 addresses only one test table (increasing the number of regions on the other test
tables exposes at least one other problem).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}}
dummy columns got created, causing spurious schema changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}},
{{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code
tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}}
(with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}}
(so, again, downstream code tried to access elements of (correctly) empty vectors, yielding
{{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length
and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}}
(with ~"... {{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case of a zero-record
record batch (so when it read a zero-row record batch, it caused a memory leak reported at
Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited identifiers
of a form (with a period) that Drill can't handle (so the test failed when the test ran with
multiple fragments).




> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata
(schema) for 0-row results...]
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2288
>                 URL: https://issues.apache.org/jira/browse/DRILL-2288
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Information Schema
>            Reporter: Daniel Barclay (Drill)
>            Assignee: Daniel Barclay (Drill)
>             Fix For: 1.3.0
>
>         Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up (getColumnCount()
returns zero, and trying to access any other metadata throws IndexOutOfBoundsException) for
a result set with zero rows, at least for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message