hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16274) Support tuning of NDV of columns using lower/upper bounds
Date Thu, 23 Mar 2017 22:00:43 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939290#comment-15939290
] 

Jason Dere commented on HIVE-16274:
-----------------------------------

Unfortunately this scales the nDV of all column in the same way, which makes the nDV look
high for all columns. If the column has a min/max range, or data type limits, could we at
least bound it by those?
The point of having this was because the density function estimate does not work well in some
cases. Longer term it might be nice to have some improvements in that area as well.

> Support tuning of NDV of columns using lower/upper bounds
> ---------------------------------------------------------
>
>                 Key: HIVE-16274
>                 URL: https://issues.apache.org/jira/browse/HIVE-16274
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-16274.01.patch
>
>
> For partitioned tables, the distinct value (nDV) estimate for a column is by default
set to the largest nDV value in any of the partitions being considered, which is a lower bound
on the nDV estimate.
> This provides a config setting to allow the estimate to a specified fraction (0.0 - 1.0)
of the higher bound on the nDV estimate (the sum of all the nDVs in all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message