hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: ORC files and statistics
Date Tue, 19 Jan 2016 16:27:46 GMT
It has both. Each index has statistics of min, max, count, and sum for each
column in the row group of 10,000 rows. It also has the location of the
start of each row group, so that the reader can jump straight to the
beginning of the row group. The reader takes a SearchArgument (eg. age >
100)  that limits which rows are required for the query and can avoid
reading an entire file, or at least sections of the file.

.. Owen

On Tue, Jan 19, 2016 at 7:50 AM, Ashok Kumar <ashok34668@yahoo.com> wrote:

> Hi,
>
> I have read some notes on ORC files in Hive and indexes.
>
> The document describes in the indexes but makes reference to statistics
>
> Indexes <https://orc.apache.org/docs/indexes.html>
>
>
> [image: image] <https://orc.apache.org/docs/indexes.html>
>
>
>
>
>
> Indexes <https://orc.apache.org/docs/indexes.html>
> Indexes ORC provides three level of indexes within each file: file level -
> statistics about the values in each column across the entire file
> View on orc.apache.org <https://orc.apache.org/docs/indexes.html>
> Preview by Yahoo
>
>
> I am confused as it is mixing up indexes with statistics. Can someone
> clarify these.
>
> Thanks
>

Mime
View raw message