hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-12638) Hive should not create empty files in partitions
Date Wed, 09 Dec 2015 21:00:13 GMT
Owen O'Malley created HIVE-12638:
------------------------------------

             Summary: Hive should not create empty files in partitions
                 Key: HIVE-12638
                 URL: https://issues.apache.org/jira/browse/HIVE-12638
             Project: Hive
          Issue Type: Bug
          Components: File Formats
            Reporter: Owen O'Malley


Currently Hive creates empty files for buckets with no rows in a directory. I believe this
was originally because the SMB and bucket join require files to be present to get InputSplits.
There are customers where this behavior leads the creation of more 200,000 empty ORC files
per an hour on a cluster (with peaks of more than 725,000 per an hour). We've also seen instances
where a single DataNode is involved in 5600 of these empty ORC files within a 2 minute period.
This causes significant stress on HDFS at both the NameNode and DataNode and is completely
unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message