hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2089) Add a new input format to be able to combine multiple .gz text files
Date Tue, 05 Apr 2011 01:20:06 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015706#comment-13015706
] 

He Yongqiang commented on HIVE-2089:
------------------------------------

Actually just found that the recent hadoop's combineFileInputFormat support not splittable
files as input. So it won't be a problem for .gz files if the hadoop has the feature checked
in.

Another use case for it is Hive's SymlinkInputFormat, which may point to too many .gz files.

> Add a new input format to be able to combine multiple .gz text files
> --------------------------------------------------------------------
>
>                 Key: HIVE-2089
>                 URL: https://issues.apache.org/jira/browse/HIVE-2089
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: HIVE-2089.1.patch
>
>
> For files that is not splittable, CombineHiveInputFormat won't help. This jira is to
add a new inputformat to support this feature. This is very useful for partitions with tens
of thousands of .gz files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message