hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remus Rusanu <rem...@microsoft.com>
Subject Why do I get statistics diff in EXPLAIN for Parquet?
Date Mon, 17 Feb 2014 13:59:13 GMT
Looking at the failed Jenkins runs for HIVE-5998, I see there are diffs in the statistics in
the EXPLAIN:

Running: diff -a /root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
/root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
72c72
<             Statistics: Num rows: 12288 Data size: 73728 Basic stats: COMPLETE Column
stats: NONE
---
>             Statistics: Num rows: 2072 Data size: 257046 Basic stats: COMPLETE Column
stats: NONE
75c75
<               Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column
stats: NONE
---
>               Statistics: Num rows: 1036 Data size: 128523 Basic stats: COMPLETE Column
stats: NONE
79c79
<                 Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column
stats: NONE
---
>                 Statistics: Num rows: 1036 Data size: 128523 Basic stats: COMPLETE Column
stats: NONE
82c82
<                   Statistics: Num rows: 10 Data size: 60 Basic stats: COMPLETE Column
stats: NONE
---
>                   Statistics: Num rows: 10 Data size: 1240 Basic stats: COMPLETE Column
stats: NONE

What would cause such statistics diffs? The Parquet file is created as:

create table if not exists alltypes_parquet (
  cint int,
  ctinyint tinyint,
  csmallint smallint,
  cfloat float,
  cdouble double,
  cstring1 string) stored as parquet;

insert overwrite table alltypes_parquet
  select cint,
    ctinyint,
    csmallint,
    cfloat,
    cdouble,
    cstring1
  from alltypesorc;

Note that there are no diffs in the actual query results.

Thanks,
~Remus

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message