hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table
Date Sat, 06 Oct 2012 01:44:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470851#comment-13470851
] 

Kevin Wilfong commented on HIVE-3541:
-------------------------------------

It would be good if the bucketing was maintained even in the face of selects, filters, and
other operators through which the values of the columns the table is bucketed on pass through
unmodified.
                
> Allow keeping the bucket order while streaming bucketed table
> -------------------------------------------------------------
>
>                 Key: HIVE-3541
>                 URL: https://issues.apache.org/jira/browse/HIVE-3541
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Igor Kabiljo
>            Priority: Minor
>
> If we have a bucketed table, for example table_a with columns col_key and col_value (bucketed
on col_key), and we need to create new derived bucketed table (by for example SELECT col_key,
col_value*2 FROM table a), it would be fastest if it can be done in single streaming map-only
job. 
> With specifying:
> SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> we can make sure that each input bucket will be read by exactly one mapper, and that
they will output exactly one file. With:
> SET hive.merge.mapfiles = false;
> SET hive.merge.mapredfiles = false;
> SET hive.enforce.bucketing = false;
> We can make sure those files are inserted as is into the output table. 
> But with that - bucket order is not kept, so end table is not bucketed correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message