hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Lu (JIRA)" <>
Subject [jira] [Created] (HIVE-3421) Column Level Top K Values Statistics
Date Fri, 31 Aug 2012 18:59:08 GMT
Feng Lu created HIVE-3421:

             Summary: Column Level Top K Values Statistics
                 Key: HIVE-3421
             Project: Hive
          Issue Type: New Feature
            Reporter: Feng Lu
            Assignee: Feng Lu

Compute (estimate) top k values for each column, and put the most skewed column into skewed
info, if user hasn't specified skew.

This feature depends on ListBucketing (create skewed table)

All column topk can be added to skewed info, if in the future skewed info supports multiple
independent columns.

The TopK algorithm is based on this paper:

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message