hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: select count(*) from table;
Date Tue, 22 Mar 2016 08:02:28 GMT
ORC file has the following stats levels for storage indexes


   1. ORC File itself
   2. Multiple stripes (chunks) within the ORC file
   3. Multiple row groups (row batches) within each stripe

Assuming that the underlying table has stats updated, count will be stored
for each column

So when we do something like below:

select count(1) from orctest

you can see stats collected if you do

show create table orctest;

 TBLPROPERTIES (                                              |
|   'COLUMN_STATS_ACCURATE'='true',                            |
|   'numFiles'='31',                                           |
|   *'numRows'='250000'*,                                        |


File statistics, Stripe statistics and row group statistics are kept. So
ORC table will rely on those if needed


HTH




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 22 March 2016 at 07:14, Amey Barve <ameybarve15@gmail.com> wrote:

> select count(*) from table;
>
> How does hive evaluate count(*) on a table?
>
> Does it return count by actually querying table, or directly return count
> by consulting some statistics locally.
>
> For Hive's Text format it takes few seconds while Hive's Orc format takes
> fraction of seconds.
>
> Regards,
> Amey
>

Mime
View raw message