hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers
Date Wed, 18 Mar 2020 11:11:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061629#comment-17061629
] 

Hive QA commented on HIVE-23034:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12996984/HIVE-23034.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18122 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/21156/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21156/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21156/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12996984 - PreCommit-HIVE-Build

> Arrow serializer should not keep the reference of arrow offset and validity buffers
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-23034
>                 URL: https://issues.apache.org/jira/browse/HIVE-23034
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, Serializers/Deserializers
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-23034.01.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, a part of writeList() method in arrow serializer is implemented like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
>     int nextOffset = 0;
>     for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>       int selectedIndex = rowIndex;
>       if (vectorizedRowBatch.selectedInUse) {
>         selectedIndex = vectorizedRowBatch.selected[rowIndex];
>       }
>       if (hiveVector.isNull[selectedIndex]) {
>         offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>       } else {
>         offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>         nextOffset += (int) hiveVector.lengths[selectedIndex];
>         arrowVector.setNotNull(rowIndex);
>       }
>     }
>     offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();}}
and keep updating the arrow vector and offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates the offset
and validity buffers when a threshold is crossed, updates the references internally and also
releases the old buffers (which decrements the buffer reference count). Now the reference
which we obtained in 1) becomes obsolete. Furthermore if try to read or write old buffer,
we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
> 	at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
> 	at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
> 	at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
> 	at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
> 	at org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( {{arrowVector.getOffsetBuffer()}}
) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 0.10.0
but should be handled the same way for 0.10.0 too as it does the same thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message