hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit
Date Sat, 22 Aug 2015 00:12:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707674#comment-14707674
] 

Owen O'Malley commented on HIVE-11592:
--------------------------------------

Does this patch detect the case where a field ends at the buffer boundary? It seems like that
would be undetected and thus not expand the range.

> ORC metadata section can sometimes exceed protobuf message size limit
> ---------------------------------------------------------------------
>
>                 Key: HIVE-11592
>                 URL: https://issues.apache.org/jira/browse/HIVE-11592
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>             Fix For: 1.3.0, 2.0.0
>
>         Attachments: HIVE-11592.1.patch, HIVE-11592.2.patch, HIVE-11592.3.patch
>
>
> If there are too many small stripes and with many columns, the overhead for storing metadata
(column stats) can exceed the default protobuf message size of 64MB. Reading such files will
throw the following exception
> {code}
> Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol
message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase
the size limit.
>         at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>         at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>         at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
>         at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1331)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1281)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
>         at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4887)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4803)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
>         at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12925)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12872)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
>         at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13599)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13546)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
>         at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>         at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>         at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>         at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.<init>(ReaderImpl.java:468)
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:314)
>         at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
>         at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> The only solution for this is to programmatically increase the CodeInputStream size limit.
We should make this configurable via hive config so that the orc file is readable. Alternatively,
we can keep increasing the size until it parsing succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message