hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1071) Making RCFile "concatenatable" to reduce the number of files of the output
Date Wed, 20 Jan 2010 22:54:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803040#action_12803040
] 

Jeff Hammerbacher commented on HIVE-1071:
-----------------------------------------

bq. we could create a API in HDFS that concatenates a set of files into one file.

Would be a fantastic primitive to add to HDFS.

> Making RCFile "concatenatable" to reduce the number of files of the output
> --------------------------------------------------------------------------
>
>                 Key: HIVE-1071
>                 URL: https://issues.apache.org/jira/browse/HIVE-1071
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>
> Hive automatically determine the number of reducers most of the time.
> Sometimes, we create a lot of small files.
> Hive has an option to "merge" those small files though a map-reduce job.
> Dhruba has the idea which can fix it even faster:
> if we can make RCFile concatenatable, then we can simply tell the namenode to "merge"
these files.
> Pros: This approach does not do any I/O so it's faster.
> Cons: We have to zero-fill the files to make sure they can be concatenated (all blocks
except the last have to be full HDFS blocks).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message