orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonatan Augarten <y...@intango.com>
Subject Re: Google Protobuf Version
Date Tue, 26 Sep 2017 08:36:03 GMT
No, the file is invalid. The problem is that our code sometimes generates
invalid ORC files.
The code is always called from a single thread, and it performs a series of
"addRowBatch" actions on a writer.
The file is then closed and loaded to a hive table.
This works 99% of the times, but in some cases the resulting file is
somehow corrupt.
See below the stack trace of an attempt to run orcfiledump on this file.

Thanks for your help,
Yoni.

Exception in thread "main"
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
had invalid wire type.
    at
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
    at
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
    at
com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
    at
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.<init>(OrcProto.java:16466)
    at
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.<init>(OrcProto.java:16424)
    at
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$1.parsePartialFrom(OrcProto.java:16562)
    at
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$1.parsePartialFrom(OrcProto.java:16557)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
    at
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:16910)
    at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:374)
    at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
    at
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
    at
org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaData(FileDump.java:96)
    at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:81)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



On Tue, Sep 26, 2017 at 12:11 AM, Owen O'Malley <owen.omalley@gmail.com>
wrote:

> On Mon, Sep 25, 2017 at 12:47 PM, Yonatan Augarten <yoni@intango.com>
> wrote:
>
>> Would you say that it's likely that this error (*Protocol message
>> contained an invalid tag (zero)*) is caused by the wrong version?
>>
>
>  No, it is likely something else. However, I haven't seen that error
> coming out of the ORC reader before. Can you give me the whole stack trace?
> Are you sure that it is a valid ORC file?
>
> Thanks,
>    Owen
>

Mime
View raw message