hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xiaoyu wang (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-2775) allow the number of files to be a multiple of bucketed table
Date Thu, 02 Feb 2012 05:03:55 GMT
allow the number of files to be a multiple of bucketed table
------------------------------------------------------------

                 Key: HIVE-2775
                 URL: https://issues.apache.org/jira/browse/HIVE-2775
             Project: Hive
          Issue Type: New Feature
          Components: Metastore
            Reporter: xiaoyu wang


Currently, hive bucketed table requires the number of files to match the bucket number in
order to for correct sampling. This is very restrictive. e.g. we can only populate the table
using a fix number of reducer, which can be a bottleneck. 

The idea is to introduce this "physical bucket" and "logical bucket" concept. "physical bucket"
is the number of files and "logical bucket" is the number of bucket stored in meda-data for
bucketed table. By allowing "physical bucket" to be a multiple of "logical bucket", we can
do correct sampling as well as scaling up. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message