hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Kabiljo (JIRA)" <>
Subject [jira] [Created] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table
Date Sat, 06 Oct 2012 01:34:02 GMT
Igor Kabiljo created HIVE-3541:

             Summary: Allow keeping the bucket order while streaming bucketed table
                 Key: HIVE-3541
             Project: Hive
          Issue Type: Improvement
            Reporter: Igor Kabiljo
            Priority: Minor

If we have a bucketed table, for example table_a with columns col_key and col_value (bucketed
on col_key), and we need to create new derived bucketed table (by for example SELECT col_key,
col_value*2 FROM table a), it would be fastest if it can be done in single streaming map-only

With specifying:
we can make sure that each input bucket will be read by exactly one mapper, and that they
will output exactly one file. With:
SET hive.merge.mapfiles = false;
SET hive.merge.mapredfiles = false;
SET hive.enforce.bucketing = false;
We can make sure those files are inserted as is into the output table. 
But with that - bucket order is not kept, so end table is not bucketed correctly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message