hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laljo John Pullokkaran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-6540) Support Multi Column Stats
Date Mon, 03 Mar 2014 21:48:20 GMT
Laljo John Pullokkaran created HIVE-6540:
--------------------------------------------

             Summary: Support Multi Column Stats
                 Key: HIVE-6540
                 URL: https://issues.apache.org/jira/browse/HIVE-6540
             Project: Hive
          Issue Type: Improvement
            Reporter: Laljo John Pullokkaran
            Assignee: Laljo John Pullokkaran


For Joins involving compound predicates, multi column stats can be used to accurately compute
the NDV.

Objective is to compute NDV of more than one columns.

Compute NDV of (x,y,z).

R1 IJ R2 on R1.x=R2.x and R1.y=R2.y and R1.z=R2.z can use max(NDV(R1.x, R1.y, R1.z), NDV(R2.x,
R2.y, R2.z)) for Join NDV (& hence selectivity).

http://www.oracle-base.com/articles/11g/statistics-collection-enhancements-11gr1.php#multi_column_statistics
http://blogs.msdn.com/b/ianjo/archive/2005/11/10/491548.aspx
http://developer.teradata.com/database/articles/removing-multi-column-statistics-a-process-for-identification-of-redundant-statist



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message