hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Kabiljo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table
Date Sat, 06 Oct 2012 01:34:02 GMT
Igor Kabiljo created HIVE-3541:
----------------------------------

             Summary: Allow keeping the bucket order while streaming bucketed table
                 Key: HIVE-3541
                 URL: https://issues.apache.org/jira/browse/HIVE-3541
             Project: Hive
          Issue Type: Improvement
            Reporter: Igor Kabiljo
            Priority: Minor


If we have a bucketed table, for example table_a with columns col_key and col_value (bucketed
on col_key), and we need to create new derived bucketed table (by for example SELECT col_key,
col_value*2 FROM table a), it would be fastest if it can be done in single streaming map-only
job. 

With specifying:
SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
we can make sure that each input bucket will be read by exactly one mapper, and that they
will output exactly one file. With:
SET hive.merge.mapfiles = false;
SET hive.merge.mapredfiles = false;
SET hive.enforce.bucketing = false;
We can make sure those files are inserted as is into the output table. 
But with that - bucket order is not kept, so end table is not bucketed correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message