hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-439) merge small files whenever possible
Date Tue, 21 Apr 2009 16:44:47 GMT
merge small files whenever possible

                 Key: HIVE-439
                 URL: https://issues.apache.org/jira/browse/HIVE-439
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Namit Jain

There are cases when the input to a Hive job are thousands of small files. In this case, there
is a mapper for each file. Most of the overhead for spawning all these mappers can be avoided
if these small files are combined into fewer larger files.

The problem can also be addressed by having a mapper span multiple blocks as in:


Bit, it also makes sense in HIVE to merge files whenever possible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message