hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: CombineHiveInputFormat does not call getSplits on custom InputFormat
Date Wed, 25 Feb 2015 17:15:38 GMT

ThereĀ¹s a special interface in hive-1.0, which gives more information to
the input format.

But entirely skipping combination results in so many performance problems
that in Tez we are forced to abandon this approach and have Tez generate
grouped-splits on the application master (which basically call
InputFormat::getSplits(), then groups them to get locality splits).

This is differentiated by hive.tez.input.format instead of just via


On 2/19/15, 10:09 AM, "Luke Lovett" <> wrote:

>I'm working on defining a custom InputFormat and OutputFormat for use
>with Hive. I'd like tables using these IF/OF to be native tables, so
>that I can LOAD DATA and INSERT INTO them. However, I'm finding that
>with the default CombineHiveInputFormat, the getSplits method of my
>InputFormat is not being called. If I "set
>;", then
>getSplits is called.
>What I want to know is:
>- Is this difference in behavior between CombineHiveInputFormat and
>HiveInputFormat intentional?
>- Is there any way of forcing CombineHiveInputFormat to call getSplits
>on my own InputFormat? I was reading through the code for
>CombineHiveInputFormat, and it looks like it might only call my own
>InputFormat's getSplits method if the table is non-native. I'm not sure
>if I'm interpreting this correctly.
>- Is it better to set "hive.input.format" to work around this, or to
>create a StorageHandler and make non-native tables?
>Thanks for any advice.

View raw message