hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <>
Subject [jira] Commented: (HIVE-1071) Making RCFile "concatenatable" to reduce the number of files of the output
Date Wed, 20 Jan 2010 22:54:54 GMT


Jeff Hammerbacher commented on HIVE-1071:

bq. we could create a API in HDFS that concatenates a set of files into one file.

Would be a fantastic primitive to add to HDFS.

> Making RCFile "concatenatable" to reduce the number of files of the output
> --------------------------------------------------------------------------
>                 Key: HIVE-1071
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
> Hive automatically determine the number of reducers most of the time.
> Sometimes, we create a lot of small files.
> Hive has an option to "merge" those small files though a map-reduce job.
> Dhruba has the idea which can fix it even faster:
> if we can make RCFile concatenatable, then we can simply tell the namenode to "merge"
these files.
> Pros: This approach does not do any I/O so it's faster.
> Cons: We have to zero-fill the files to make sure they can be concatenated (all blocks
except the last have to be full HDFS blocks).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message