drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Barclay (Drill) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4001) Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)
Date Tue, 03 Nov 2015 19:48:27 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Barclay (Drill) updated DRILL-4001:
------------------------------------------
    Description: 
In certain cases, {{MapVector.load(...)}} (called by {{RecordBatchLoader.load(...)}}) returns
with some map child vectors having a length of zero instead of having a length matching the
length of sibling vectors and the number of records in the batch.  This causes {{MapVector.getObject(int)}}
to fail, saying "{{java.lang.IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0,
0)}}" (one of the errors seen in fixing DRILL-2288).

The condition seems to be that a child field (e.g., an HBase column in a HBase column family)
appears in an earlier batch and does not appear in a later batch.  

(The HBase column's child vector gets created (in the MapVector for the HBase column family)
during loading of the earlier batch.  During loading of the later batch, all vectors get reset
to zero length, and then only vectors for fields _appearing in the batch message being loaded_
get loaded and set to the length of the batch-\-other vectors created from earlier messages/{{load}}
calls are left with a length of zero (instead of, say, being filled with nulls to the length
of their siblings and the current record batch).)

See the TODO(DRILL-4001) mark and workaround in {{MapVector.getObject(int)}}.



  was:
In certain cases, {{MapVector.load(...)}} (called by {{RecordBatchLoader.load(...)}}) returns
with some map child vectors having a length of zero instead of having a length matching the
length of sibling vectors and the number of records in the batch.  This caused IndexOutOfBoundsException
errors saying (roughly) "

  (This caused some of the {{IndexOutOfBoundException}} errors seen in fixing DRILL-2288.)

The condition seems to be that a child field (e.g., an HBase column in a HBase column family)
appears in an earlier batch and does not appear in a later batch.  

(The HBase column's child vector gets created (in the MapVector for the HBase column family)
during loading of the earlier batch.  During loading of the later batch, all vectors get reset
to zero length, and then only vectors for fields _appearing in the batch message being loaded_
get loaded and set to the length of the batch-\-other vectors created from earlier messages/{{load}}
calls are left with a length of zero (instead of, say, being filled with nulls to the length
of their siblings and the current record batch).)

See the TODO(DRILL-4001) mark and workaround in {{MapVector.getObject(int)}}.




> Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-4001
>                 URL: https://issues.apache.org/jira/browse/DRILL-4001
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>            Reporter: Daniel Barclay (Drill)
>
> In certain cases, {{MapVector.load(...)}} (called by {{RecordBatchLoader.load(...)}})
returns with some map child vectors having a length of zero instead of having a length matching
the length of sibling vectors and the number of records in the batch.  This causes {{MapVector.getObject(int)}}
to fail, saying "{{java.lang.IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0,
0)}}" (one of the errors seen in fixing DRILL-2288).
> The condition seems to be that a child field (e.g., an HBase column in a HBase column
family) appears in an earlier batch and does not appear in a later batch.  
> (The HBase column's child vector gets created (in the MapVector for the HBase column
family) during loading of the earlier batch.  During loading of the later batch, all vectors
get reset to zero length, and then only vectors for fields _appearing in the batch message
being loaded_ get loaded and set to the length of the batch-\-other vectors created from earlier
messages/{{load}} calls are left with a length of zero (instead of, say, being filled with
nulls to the length of their siblings and the current record batch).)
> See the TODO(DRILL-4001) mark and workaround in {{MapVector.getObject(int)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message