hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-9153) Perf enhancement on CombineHiveInputFormat and HiveInputFormat
Date Tue, 20 Dec 2016 05:25:58 GMT


Rui Li commented on HIVE-9153:

I guess no configuration is suitable for all cases :) If I remember, smaller "mapreduce.input.fileinputformat.split.maxsize"
means more map tasks and is bad for performance when the data size is relatively big. So increasing
it should help for most cases. Of course users should adjust it according to the cluster deployment,
executor resources etc.
I'm not sure what you mean by performance test JIRAs. We have quite a few JIRAs to improve
performance, and I think each such JIRA involves some simple performance test to verify the
improvement. But I don't remember all of them.

> Perf enhancement on CombineHiveInputFormat and HiveInputFormat
> --------------------------------------------------------------
>                 Key: HIVE-9153
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Brock Noland
>            Assignee: Rui Li
>             Fix For: 1.1.0
>         Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, HIVE-9153.2.patch,
HIVE-9153.3.patch, screenshot.PNG
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. However,
Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in Spark, it might make sense
for us to use {{HiveInputFormat}} as well. We should evaluate this on a query which has many
input splits such as {{select count(\*) from store_sales where something is not null}}.

This message was sent by Atlassian JIRA

View raw message