hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: Malformed orc file
Date Thu, 04 Aug 2016 18:16:38 GMT
Hi

In case of streaming, when a transaction is open orc file is not closed and hence may not
be flushed completely. Did the transaction commit successfully? Or was there any exception
thrown during writes/commit?

Thanks
Prasanth

On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko <f1sherox@gmail.com<mailto:f1sherox@gmail.com>>
wrote:

Hello, I've got a malformed ORC file in my Hive table. File was created by Hive Streaming
API and I have no idea under what circumstances it became corrupted.

File on google drive: link<https://drive.google.com/file/d/0ByB92PAoAkrKeFFZRUN4WWVQY1U/view?usp=sharing>

Exception message when trying to perform select from table:
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1468498236400_1106_6_00, diagnostics=[Task
failed, taskId=task_1468498236400_1106_6_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error:
Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException:
org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000<http://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000>.
Invalid postscript length 0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException:
Malformed ORC file hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000<http://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000>.
Invalid postscript length 0
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:326)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed
ORC file hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000<http://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000>.
Invalid postscript length 0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
... 19 more
Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000<http://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_00000>.
Invalid postscript length 0
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl.java:236)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:376)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:317)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1259)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
... 20 more

Does anyone encountered such a situation?


Mime
View raw message