hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]
Date Thu, 18 Sep 2014 11:43:34 GMT


Rui Li commented on HIVE-8043:

Hi [~xuefuz],

The DDL task that merges files is an alter table statement:
In this case, the DDL task creates a {{MergeFileTask}} and {{MergeFileTask}} launches an MR
job to merge the files. This feature currently only supports RC/Orc tables.

Strange thing is that I didn't find anything about this in the wiki or other official doc.
Maybe I'm missing something?

The main problem I see here is that, ideally we should launch the job according to the execution
engine. But DDL task uses a different semantic analyzer {{DDLSemanticAnalyzer}}, and always
launches an MR job. I think Tez doesn't handle this either.

> Support merging small files [Spark Branch]
> ------------------------------------------
>                 Key: HIVE-8043
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>              Labels: Spark-M1
>         Attachments: HIVE-8043.1-spark.patch
> Hive currently supports merging small files with MR as the execution engine. There are
options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we might need
a little more research and design on this.

This message was sent by Atlassian JIRA

View raw message