Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F05D17983 for ; Fri, 22 May 2015 16:51:39 +0000 (UTC) Received: (qmail 79084 invoked by uid 500); 22 May 2015 16:51:36 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 79009 invoked by uid 500); 22 May 2015 16:51:36 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 78997 invoked by uid 99); 22 May 2015 16:51:36 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2015 16:51:36 +0000 Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 37D981A06E7 for ; Fri, 22 May 2015 16:51:36 +0000 (UTC) Received: by igbpi8 with SMTP id pi8so40252927igb.1 for ; Fri, 22 May 2015 09:51:35 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.50.147.10 with SMTP id tg10mr7016189igb.36.1432313495319; Fri, 22 May 2015 09:51:35 -0700 (PDT) Received: by 10.107.23.7 with HTTP; Fri, 22 May 2015 09:51:35 -0700 (PDT) In-Reply-To: References: Date: Fri, 22 May 2015 09:51:35 -0700 Message-ID: Subject: Re: Malformed Orc file Invalid postscript length 0 From: "Owen O'Malley" To: "user@hive.apache.org" Cc: "Bhavana Kamichetty (bkamiche)" Content-Type: multipart/alternative; boundary=089e0122a2d018a0b40516ae7946 --089e0122a2d018a0b40516ae7946 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Bhavana, Could you send me (omalley@apache.org) the incorrect ORC file? Which file system were you using? hdfs? Which version of Hadoop and Hive? Thanks, Owen On Fri, May 22, 2015 at 9:37 AM, Grant Overby (groverby) wrote: > I=E2=80=99m getting the following exception when Hive executes a query o= n an > external table. It seems the postscript isn=E2=80=99t written even though= .close() > is called and returns normally. Any thoughts? > > > java.io.IOException: Malformed ORC file > hdfs://twig06.twigs:8020/warehouse/completed/events/connection_events/dt= =3D1432229400/1432229419251-bb46892c-939f-45ca-b867-da3675d0ca72.orc. > Invalid postscript length 0 > > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl.ja= va:230) > > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(Rea= derImpl.java:370) > > at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311= ) > > at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228= ) > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.= java:1130) > > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputF= ormat.java:1039) > > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFor= mat.java:246) > > These orc files are written manually using an orc writer: > > Path tmpPath =3D new Path(tmpPathName); > Configuration writerConf =3D new Configuration(); > OrcFile.WriterOptions writerOptions =3D OrcFile.writerOptions(writerConf)= ; > writerOptions.bufferSize(256 * 1024); > writerOptions.compress(SNAPPY); > writerOptions.fileSystem(fileSystem); > writerOptions.inspector(new FlatTableObjectInspector(dbName + "." + table= Name, fields)); > writerOptions.rowIndexStride(10_000); > writerOptions.blockPadding(true); > writerOptions.stripeSize(122 * 1024 * 1024); > writerOptions.version(V_0_12); > writer =3D OrcFile.createWriter(tmpPath, writerOptions); > > > The writer.close() is executed and only if writer.close() returns > normally is the orc file moved from a tmp dir to the external table > partition=E2=80=99s dir. > > private void closeWriter() { > if (writer !=3D null) { > try { > writer.close(); > Path tmpPath =3D new Path(tmpPathName); > if (fileSystem.exists(tmpPath) && fileSystem.getFileStatus(tm= pPath).getLen() > 0) { > Path completedPath =3D new Path(completedPathName); > fileSystem.setPermission(tmpPath, PERMISSION_664); > fileSystem.rename(tmpPath, completedPath); > HiveOperations.getInstance().registerExternalizedPartitio= n(dbName, tableName, partition); > } else if (fileSystem.exists(tmpPath)) { > fileSystem.delete(tmpPath, false); > } > } catch (IOException e) { > Throwables.propagate(e); > } finally { > writer =3D null; > } > } > } > > > I expect writer.close() to write the postscript, but it seems not to > have. > > > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-exec= /0.14.0/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java#WriterImpl.close%2= 8%29 > > > Thoughts? > Am I doing something wrong? Bug? > Fix? > > *Grant Overby* > Software Engineer > Cisco.com > groverby@cisco.com > Mobile: *865 724 4910 <865%20724%204910>* > > > > Think before you print. > > This email may contain confidential and privileged material for the sole > use of the intended recipient. Any review, use, distribution or disclosur= e > by others is strictly prohibited. If you are not the intended recipient (= or > authorized to receive for the recipient), please contact the sender by > reply email and delete all copies of this message. > > Please click here > for > Company Registration Information. > > > > --089e0122a2d018a0b40516ae7946 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Bhavana,
=C2=A0 =C2=A0Could you send me (omalley@apache.org) the incorrect ORC file? Wh= ich file system were you using? hdfs? Which version of Hadoop and Hive?

Thanks,
=C2=A0 =C2=A0Owen

On Fri, May 22, 2015 at= 9:37 AM, Grant Overby (groverby) <groverby@cisco.com> wrot= e:

I=E2=80=99m getting the followi= ng exception when Hive executes a query on an external table. It seems the = postscript isn=E2=80=99t written even though .close() is called and returns normally. Any thoughts?



java.io.IOException: Malformed ORC file hdfs://twig06.twigs:8020/warehouse/= completed/events/connection_events/dt=3D1432229400/1432229419251-bb46892c-9= 39f-45ca-b867-da3675d0ca72.orc. Invalid postscript length 0

at org.apache.hadoop.hive.ql.io= .orc.ReaderImpl.ensureOrcFooter(ReaderImpl.java:230)

at org.apache.hadoop.hive.ql.io= .orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:370)

at org.apache.hadoop.hive.ql.io= .orc.ReaderImpl.<init>(ReaderImpl.java:311)

at org.apache.hadoop.hive.ql.io= .orc.OrcFile.createReader(OrcFile.java:228)

at org.apache.hadoop.hive.ql.io= .orc.OrcInputFormat.getReader(OrcInputFormat.java:1130)

at org.apache.hadoop.hive.ql.io= .orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1039)

at org.apache.hadoop.hive.ql.io= .HiveInputFormat.getRecordReader(HiveInputFormat.java:246)


These orc files are written manually using an orc writer:
Path tmpPath =3D new =
Path(tmpPathName);
Configuration writerConf =3D new Configuration();
OrcFile.WriterOptions writerOptions =3D OrcFile.writerOptions(writerConf);=
writerOptions.bufferSize(256 * 1024);
writerOptions.compress(SNAPPY);
writerOpti= ons.fileSystem(fileSystem);
writerOptions.inspector(new FlatTableObjectInspector(dbName + "." + tableName
, fields));
writerOptions.rowIndexStride(10_000);
writerOptions.bloc= kPadding(true);
writerOptions.stripeSize(12= 2 * 1024 * 1024);
writerOptio= ns.version(V_0_12);
write= r =3D OrcFile.createWriter(= tmpPath, writerOptions);

The writer.close() is executed and only if writer.close() returns norm= ally is the orc file moved from a tmp dir to the external table partition= =E2=80=99s dir.

private void <=
span style=3D"color:#ffc66d">closeWriter() {
if (writer !=3D <= span style=3D"color:#cc7832">null) {
try {
writer<= /span>.close();
Path tmpPath =3D new Path(tmpPathName);
= if (fileSystem.exists(tmpPath)= && fileSystem.getFileStatus(t= mpPath).getLen() > 0) {
= Path completedPath =3D new Pat= h(completedPathName);
fileSystem.setPermission(tmpPath, PERMISSION_664);
fileSystem.rename(tmpPath, completedPath);
HiveOperations.getInstance().registerExternalizedPartition(dbName, tableName, partition);
} else if (fileSys= tem.exists(tmpPath)) {
fileSystem.delete(tmpPath, false);
}
} catch (IOException e) {
Throwables.propagate(e);
} fina= lly {
writer =3D= null;
}
}
}

I expect writer.close() to write the postscript, but it seems not to h= ave.



Thoughts?
Am I doing something wrong? Bug?
Fix?

Grant Overby
Software Engineer
Cisco.com
groverby@cisco.com
Mobile:=C2=A0865 724 4910



=C2=A0Think before you print.

This email may contain confidential and privileged material for the sole= use of the intended recipient. Any review, use, distribution or disclosure= by others is strictly prohibited. If you are not the intended recipient (o= r authorized to receive for the recipient), please contact the sender by reply email and delete all copies= of this message.

Please=C2=A0click here=C2=A0for Company Registration Informati= on.




--089e0122a2d018a0b40516ae7946--