hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1071) Making RCFile "concatenatable" to reduce the number of files of the output
Date Wed, 20 Jan 2010 22:15:54 GMT
Making RCFile "concatenatable" to reduce the number of files of the output
--------------------------------------------------------------------------

                 Key: HIVE-1071
                 URL: https://issues.apache.org/jira/browse/HIVE-1071
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


Hive automatically determine the number of reducers most of the time.
Sometimes, we create a lot of small files.

Hive has an option to "merge" those small files though a map-reduce job.

Dhruba has the idea which can fix it even faster:
if we can make RCFile concatenatable, then we can simply tell the namenode to "merge" these
files.

Pros: This approach does not do any I/O so it's faster.
Cons: We have to zero-fill the files to make sure they can be concatenated (all blocks except
the last have to be full HDFS blocks).




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message