hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit
Date Wed, 19 Aug 2015 03:27:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702377#comment-14702377
] 

Hive QA commented on HIVE-11592:
--------------------------------



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751102/HIVE-11592.3.patch

{color:green}SUCCESS:{color} +1 9370 tests passed

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5003/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5003/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5003/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751102 - PreCommit-HIVE-TRUNK-Build

> ORC metadata section can sometimes exceed protobuf message size limit
> ---------------------------------------------------------------------
>
>                 Key: HIVE-11592
>                 URL: https://issues.apache.org/jira/browse/HIVE-11592
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-11592.1.patch, HIVE-11592.2.patch, HIVE-11592.3.patch
>
>
> If there are too many small stripes and with many columns, the overhead for storing metadata
(column stats) can exceed the default protobuf message size of 64MB. Reading such files will
throw the following exception
> {code}
> Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol
message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase
the size limit.
>         at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>         at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>         at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
>         at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1331)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1281)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
>         at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4887)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4803)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
>         at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12925)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12872)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
>         at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13599)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13546)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
>         at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>         at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>         at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>         at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>         at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.<init>(ReaderImpl.java:468)
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:314)
>         at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
>         at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> The only solution for this is to programmatically increase the CodeInputStream size limit.
We should make this configurable via hive config so that the orc file is readable. Alternatively,
we can keep increasing the size until it parsing succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message