hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]
Date Thu, 18 Sep 2014 11:43:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138824#comment-14138824
] 

Rui Li commented on HIVE-8043:
------------------------------

Hi [~xuefuz],

The DDL task that merges files is an alter table statement:
{code}
ALTER TABLE tbl CONCATENATE;
{code}
In this case, the DDL task creates a {{MergeFileTask}} and {{MergeFileTask}} launches an MR
job to merge the files. This feature currently only supports RC/Orc tables.

Strange thing is that I didn't find anything about this in the wiki or other official doc.
Maybe I'm missing something?

The main problem I see here is that, ideally we should launch the job according to the execution
engine. But DDL task uses a different semantic analyzer {{DDLSemanticAnalyzer}}, and always
launches an MR job. I think Tez doesn't handle this either.

> Support merging small files [Spark Branch]
> ------------------------------------------
>
>                 Key: HIVE-8043
>                 URL: https://issues.apache.org/jira/browse/HIVE-8043
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>              Labels: Spark-M1
>         Attachments: HIVE-8043.1-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. There are
options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we might need
a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message