hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]
Date Tue, 16 Sep 2014 21:45:34 GMT


Xuefu Zhang commented on HIVE-8043:

[~lirui] Current Hive on Spark code borrowed Tez's code dealing with merging small files.
It basically falls back to MR's way to do this, and please refer to GenSparkUtils.processFileSinkOperators()
for details. I think we can take a look at HIVE-7704 to see if there is anything that we can
do similarly. Please do the research and put down your findings. We don't need to implement
it right way as it's not critical for our M1.

> Support merging small files [Spark Branch]
> ------------------------------------------
>                 Key: HIVE-8043
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>              Labels: Spark-M1
> Hive currently supports merging small files with MR as the execution engine. There are
options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we might need
a little more research and design on this.

This message was sent by Atlassian JIRA

View raw message