hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: Error inserting data to ORC table
Date Wed, 27 Nov 2013 19:01:37 GMT
Hi Juan

This seems like a bug in version 2 of RLE (Run Length Encoding) that was implemented in Hive
0.12. New version of RLE can be disabled by setting hive.exec.orc.write.format=“0.11”.
This will fallback to old version RLE. 

The reason why changing it to string type works is that, string columns uses adaptive dictionary
encoding to encode the column whereas integers uses run length encoding.

Can you file a bug for this with possible steps for reproducing this issue? Also what dataset
are you using? Will it be possible to post the segment of the dataset that causes this failure
along with the bug?

Thanks
Prasanth Jayachandran

On Nov 27, 2013, at 5:02 AM, Juan Martin Pampliega <jpampliega@gmail.com> wrote:

> Hi,
> 
> I am using Hive 0.12 with Hadoop 2.2 and trying to insert data in a new ORC table with
an INSERT SELECT statement from a TEXT file based table and I am running into the following
error (I have trimmed some of the data showed in the error):
> 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row 
> {"id":"1932685422","ad_id":"7325801318", .... , "account_id":"6875965212"}
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
>         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
>         ... 8 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 26
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:797)
>         at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
> ...
> 
> That error is produced when the ad_id column in the destination table has the type BIGINT.
When I change the column type to STRING the insert works fine. 
> 
> From what I see that value is no nearly big enough to cause any overflow issues in a
BIGINT. 
> 
> Is this a known bug or do I have to do anything in particular for this to work?
> 
> Thanks,
> Juan.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message