hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-74) Hive can use CombineFileInputFormat for when the input are many small files
Date Thu, 20 Nov 2008 21:36:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649490#action_12649490
] 

Joydeep Sen Sarma commented on HIVE-74:
---------------------------------------

looks like in the right direction. only thing i didn't understand is how we are conveying
to the combinefileinputformat that combined splits cannot span tables? there should be some
data structure that captures this information that we need to push from hive to combineinputformat.

also - how is the required number of splits configured? 

> Hive can use CombineFileInputFormat for when the input are many small files
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-74
>                 URL: https://issues.apache.org/jira/browse/HIVE-74
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: hiveCombineSplit.patch
>
>
> There are cases when the input to a Hive job are thousands of small files. In this case,
there is a mapper for each file. Most of the overhead for spawning all these mappers can be
avoided if Hive used CombineFileInputFormat introduced via HADOOP-4565

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message