hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <>
Subject [jira] Commented: (HIVE-1900) a mapper should be able to span multiple partitions
Date Fri, 07 Jan 2011 22:35:46 GMT


Ning Zhang commented on HIVE-1900:

Namit, do you mean bucketized sort-merge join? In that case don't you need to use a specialized
InputFormat and RecordReader? If we allow mappers get inputs from multiple partitions, we
need to ensure HiveInputFormat and CombineHiveInputFormat and the RecordReaders be partition

2) is important because we don't want to merge different partitions in one file. Otherwise
you need a dynamic partition insert for the merge which may generate multiple small files
for a partition again. 

3) If TableScanOperator can take multiple partitions, the stats has to be gathered according
to the input partition column values. Currently the partition column value is checked for
the 1st row and assumes all the rows have the same partitioning column value. If we allow
multiple partitions in a mapper, we have to check the partition column values for each row.

> a mapper should be able to span multiple partitions
> ---------------------------------------------------
>                 Key: HIVE-1900
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
> Currently, a  mapper only spans a single partition which creates a problem in the presence
of many
> small partitions (which is becoming a common usecase in facebook).
> If the plan is the same, a mapper should be able to span files across multiple partitions

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message