hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12309) TableScan should use column stats when available for better data size estimate
Date Tue, 10 Nov 2015 20:39:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999307#comment-14999307
] 

Prasanth Jayachandran commented on HIVE-12309:
----------------------------------------------

Left a minor comment in RB. I am worried about the scenario of INCOMPLETE column stats. What
happens if column stats is missing or stale? raw data size will always be updated (if the
appropriate configs are on and if the fileformat supports it), but column stats freshness
is not guaranteed. How do we deal with it in the estimation?

> TableScan should use column stats when available for better data size estimate
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-12309
>                 URL: https://issues.apache.org/jira/browse/HIVE-12309
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-12309.2.patch, HIVE-12309.patch
>
>
> Currently, all other operators use column stats to figure out data size, whereas TableScan
relies on rawDataSize. This inconsistency can result in an inconsistency where TS may have
lower Datasize then subsequent operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message