orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonatan Augarten <y...@intango.com>
Subject Re: Google Protobuf Version
Date Tue, 26 Sep 2017 19:09:45 GMT
Thank you for the detailed explanation!

Interesting. I'm getting the following (very strange) output (including the
spaces before the 0):

>       0 ORC&
> 10288812     ORC
> 14991902 ORC
> 33162184 ORC_R
>

The file size is 39845888 bytes.

On Tue, Sep 26, 2017 at 11:49 AM, Owen O'Malley <owen.omalley@gmail.com>
wrote:

> Ok, it was reading the postscript (via OrcProto$Postscript.parseFrom),
> which is the very first thing it does.
>
> The first thing to try is to see if you have a proper postscript somewhere
> in the file. If you are on Mac or Linux,
> try:
>
> % strings -n 3 -t d example/decimal.orc | grep ORC
>
> Replacing example/decimal.orc with your ORC file. You'll get an output
> like:
>
> 0 ORC
> 16333 ORC
>
> which are the offsets where "ORC" is located. The ORC format puts it once
> at the front of the file (so that the "file" command can detect the format)
> and once at the end of the postscript. (There is always one byte after the
> last ORC, which is the length of the postscript, so the total length of the
> file should be the final offset + 4.)
>
> .. Owen
>
> On Tue, Sep 26, 2017 at 1:36 AM, Yonatan Augarten <yoni@intango.com>
> wrote:
>
>> No, the file is invalid. The problem is that our code sometimes generates
>> invalid ORC files.
>> The code is always called from a single thread, and it performs a series
>> of "addRowBatch" actions on a writer.
>> The file is then closed and loaded to a hive table.
>> This works 99% of the times, but in some cases the resulting file is
>> somehow corrupt.
>> See below the stack trace of an attempt to run orcfiledump on this file.
>>
>> Thanks for your help,
>> Yoni.
>>
>> Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException:
>> Protocol message tag had invalid wire type.
>>     at com.google.protobuf.InvalidProtocolBufferException.
>> invalidWireType(InvalidProtocolBufferException.java:99)
>>     at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(U
>> nknownFieldSet.java:498)
>>     at com.google.protobuf.GeneratedMessage.parseUnknownField(Gener
>> atedMessage.java:193)
>>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.<init>(
>> OrcProto.java:16466)
>>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.<init>(
>> OrcProto.java:16424)
>>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$1.parse
>> PartialFrom(OrcProto.java:16562)
>>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$1.parse
>> PartialFrom(OrcProto.java:16557)
>>     at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.
>> java:89)
>>     at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.
>> java:95)
>>     at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.
>> java:49)
>>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFr
>> om(OrcProto.java:16910)
>>     at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoF
>> romFooter(ReaderImpl.java:374)
>>     at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImp
>> l.java:311)
>>     at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFil
>> e.java:228)
>>     at org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaData(File
>> Dump.java:96)
>>     at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:81)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:497)
>>     at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>
>>
>>
>> On Tue, Sep 26, 2017 at 12:11 AM, Owen O'Malley <owen.omalley@gmail.com>
>> wrote:
>>
>>> On Mon, Sep 25, 2017 at 12:47 PM, Yonatan Augarten <yoni@intango.com>
>>> wrote:
>>>
>>>> Would you say that it's likely that this error (*Protocol message
>>>> contained an invalid tag (zero)*) is caused by the wrong version?
>>>>
>>>
>>>  No, it is likely something else. However, I haven't seen that error
>>> coming out of the ORC reader before. Can you give me the whole stack trace?
>>> Are you sure that it is a valid ORC file?
>>>
>>> Thanks,
>>>    Owen
>>>
>>
>>
>

Mime
View raw message