hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-1806) The merge criteria on dynamic partitons should be per partiton
Date Thu, 23 Dec 2010 19:59:46 GMT


Namit Jain commented on HIVE-1806:

Mostly looks good - a minor comment.

In the new test that you added, the merge job is a map-only job although you are using HiveInputFormat
This is because of the fact that you are using hadoop 20 which supports CombineHiveIF.
Do you think that is the correct behavior ? Looks OK, just wanted to confirm. 

> The merge criteria on dynamic partitons should be per partiton
> --------------------------------------------------------------
>                 Key: HIVE-1806
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1806.2.patch, HIVE-1806.3.patch, HIVE-1806.4.patch, HIVE-1806.patch
> Currently the criteria of whether a merge job should be fired on dynamic generated partitions
are is the average file size of files across all dynamic partitions. It is very common that
some dynamic partitions contains mostly large files and some contains mostly small files.
Even though the average size of the total files are larger than the hive.merge.smallfiles.avgsize,
we should merge those partitions containing small files only. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message