Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6DAA511753 for ; Tue, 16 Sep 2014 17:32:35 +0000 (UTC) Received: (qmail 10162 invoked by uid 500); 16 Sep 2014 17:32:35 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 10098 invoked by uid 500); 16 Sep 2014 17:32:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 10040 invoked by uid 500); 16 Sep 2014 17:32:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 10010 invoked by uid 99); 16 Sep 2014 17:32:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Sep 2014 17:32:34 +0000 Date: Tue, 16 Sep 2014 17:32:34 +0000 (UTC) From: "Pankit Thapar (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8137) Empty ORC file handling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135778#comment-14135778 ] Pankit Thapar commented on HIVE-8137: ------------------------------------- The issue is hadoop might create a split in case its a CombineInputFormat. Hadoop specifically creates empty splits. > Empty ORC file handling > ----------------------- > > Key: HIVE-8137 > URL: https://issues.apache.org/jira/browse/HIVE-8137 > Project: Hive > Issue Type: Improvement > Components: File Formats > Affects Versions: 0.13.1 > Reporter: Pankit Thapar > Fix For: 0.14.0 > > > Hive 13 does not handle reading of a zero size Orc File properly. An Orc file is suposed to have a post-script > which the ReaderIml class tries to read and initialize the footer with it. But in case, the file is empty > or is of zero size, then it runs into an IndexOutOfBound Exception because of ReaderImpl trying to read in its constructor. > Code Snippet : > //get length of PostScript > int psLen = buffer.get(readSize - 1) & 0xff; > In the above code, readSize for an empty file is zero. > I see that ensureOrcFooter() method performs some sanity checks for footer , > so, either we can move the above code snippet to ensureOrcFooter() and throw a "Malformed ORC file exception" or we can create a dummy Reader that does not initialize footer and basically has hasNext() set to false so that it returns false on the first call. > Basically, I would like to know what might be the correct way to handle an empty ORC file in a mapred job? > Should we neglect it and not throw an exception or we can throw an exeption that the ORC file is malformed. > Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)