Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 67F51200CF3 for ; Wed, 30 Aug 2017 00:34:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 665C5167BE0; Tue, 29 Aug 2017 22:34:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A45AB167BDE for ; Wed, 30 Aug 2017 00:34:04 +0200 (CEST) Received: (qmail 57604 invoked by uid 500); 29 Aug 2017 22:34:02 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 57588 invoked by uid 99); 29 Aug 2017 22:34:02 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2017 22:34:02 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id BB8E5E7DFA; Tue, 29 Aug 2017 22:34:01 +0000 (UTC) From: prasanthj To: dev@orc.apache.org Reply-To: dev@orc.apache.org References: In-Reply-To: Subject: [GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files. Content-Type: text/plain Message-Id: <20170829223401.BB8E5E7DFA@git1-us-west.apache.org> Date: Tue, 29 Aug 2017 22:34:01 +0000 (UTC) archived-at: Tue, 29 Aug 2017 22:34:05 -0000 Github user prasanthj commented on the issue: https://github.com/apache/orc/pull/163 Hive creates empty files only for MR to support bucketed joins. Tez doesn't create empty bucket files anymore. Hive currently discards empty files during split generation. We can do similar thing in Orc's version of OrcInputFormat (or add EmptyFilePathPattern to ignore 0 length files or files <= MAGIC.length). Creating splits for empty is anyway useless. As far as calling the Reader's directly with a empty file path, we can treat it as empty file with struct<>. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---