hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mahesh kumar behera (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.
Date Fri, 14 Feb 2020 06:43:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

mahesh kumar behera updated HIVE-22856:
---------------------------------------
    Description: 
LlapArrowBatchRecordReader returns false when the ArrowStreamReader loadNextBatch returns
column vector with 0 length. But we should keep reading data until loadNextBatch returns
false. Some batch may return column vector of length 0, but we should ignore and wait for
the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has all deleted
or aborted data. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. With
0 batch size, VectorFileSinkArrowOperator creates a batch of just metadata and set the value
count to 0. This kind of batch should be ignore by the client and should wait for next batch.

  was:LlapArrowBatchRecordReader returns false when the ArrowStreamReader loadNextBatch returns
column vector with 0 length. But we should keep reading data until loadNextBatch returns
false. Some batch may return column vector of length 0, but we should ignore and wait for
the next batch.


> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader
returns a 0 length batch.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22856
>                 URL: https://issues.apache.org/jira/browse/HIVE-22856
>             Project: Hive
>          Issue Type: Bug
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>         Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader loadNextBatch returns
column vector with 0 length. But we should keep reading data until loadNextBatch returns
false. Some batch may return column vector of length 0, but we should ignore and wait for
the next batch.
> The batch size of 0 is possible in the case when a split read by ORC reader has all deleted
or aborted data. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. With
0 batch size, VectorFileSinkArrowOperator creates a batch of just metadata and set the value
count to 0. This kind of batch should be ignore by the client and should wait for next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message