hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6578) Use ORC file footer statistics through StatsProvidingRecordReader interface for analyze command
Date Fri, 14 Mar 2014 01:31:44 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth J updated HIVE-6578:
-----------------------------

    Attachment: HIVE-6578.3.patch

Addressed [~sershe]'s review comments. Had offline discussion with [~owen.omalley] . Addressed
Owen's comment as well. Following are the changes in this patch
1) Added new StatsProvidingRecordReader (similar to StatsProvidingRecordWriter) interface
which will be used to get stats from the  record reader. This works even if partition contains
data written with different file format.
2) Removed ORC specific references from StatsNoJobTask. It should not work with any file formats
that implement StatsProvidingRecordReader interface.

> Use ORC file footer statistics through StatsProvidingRecordReader interface for analyze
command
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6578
>                 URL: https://issues.apache.org/jira/browse/HIVE-6578
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>         Attachments: HIVE-6578.1.patch, HIVE-6578.2.patch, HIVE-6578.3.patch
>
>
> ORC provides file level statistics which can be used in analyze partialscan and noscan
cases to compute basic statistics like number of rows, number of files, total file size and
raw data size. On the writer side, a new interface was added earlier (StatsProvidingRecordWriter)
that exposed stats when writing a table. Similarly, a new interface StatsProvidingRecordReader
can be added which when implemented should provide stats that are gathered by the underlying
file format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message