orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasanthj <...@git.apache.org>
Subject [GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.
Date Tue, 29 Aug 2017 22:34:01 GMT
Github user prasanthj commented on the issue:

    Hive creates empty files only for MR to support bucketed joins. Tez doesn't create empty
bucket files anymore. Hive currently discards empty files during split generation. We can
do similar thing in Orc's version of OrcInputFormat (or add EmptyFilePathPattern to ignore
0 length files or files <= MAGIC.length). Creating splits for empty is anyway useless.
As far as calling the Reader's directly with a empty file path, we can treat it as empty file
with struct<>. 

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message