hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-22979) Support total file size in statistics annotation
Date Fri, 06 Mar 2020 17:53:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053639#comment-17053639
] 

Prasanth Jayachandran commented on HIVE-22979:
----------------------------------------------

[~jcamachorodriguez] thanks for the review! I agree that this will certainly be very useful
for debugging issues and will be good to have it in explain (all levels of explain) as this
is sort of give a single place to look at on-disk file size and estimated raw data size (to
know the compression factor maybe?). Created HIVE-22994 to tackle it separately after this
patch as it will touch almost all explain out files.  

> Support total file size in statistics annotation
> ------------------------------------------------
>
>                 Key: HIVE-22979
>                 URL: https://issues.apache.org/jira/browse/HIVE-22979
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-22979.1.patch, HIVE-22979.2.patch
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive statistics annotation provide estimated Statistics for each operator. The data size
provided in TableScanOperator is raw data size (after decompression and decoding), but there
are some optimizations that can be performed based on total file size on disk (scan cost estimation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message