hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remus Rusanu <rem...@microsoft.com>
Subject RE: Why do I get statistics diff in EXPLAIN for Parquet?
Date Mon, 17 Feb 2014 15:06:45 GMT
OK, so I get the similar diffs  with ORC, so is not Parquet.
The expected .out files are created running mvn test on Windows, so the issue is Windows specific
not Parquet specific. I'll investigate...

From: Remus Rusanu [mailto:remusr@microsoft.com]
Sent: Monday, February 17, 2014 3:59 PM
To: dev@hive.apache.org
Cc: Brock Noland
Subject: Why do I get statistics diff in EXPLAIN for Parquet?

Looking at the failed Jenkins runs for HIVE-5998, I see there are diffs in the statistics
in the EXPLAIN:

Running: diff -a /root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
/root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
72c72
<             Statistics: Num rows: 12288 Data size: 73728 Basic stats: COMPLETE Column
stats: NONE
---
>             Statistics: Num rows: 2072 Data size: 257046 Basic stats: COMPLETE Column
stats: NONE
75c75
<               Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column
stats: NONE
---
>               Statistics: Num rows: 1036 Data size: 128523 Basic stats: COMPLETE Column
stats: NONE
79c79
<                 Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column
stats: NONE
---
>                 Statistics: Num rows: 1036 Data size: 128523 Basic stats: COMPLETE Column
stats: NONE
82c82
<                   Statistics: Num rows: 10 Data size: 60 Basic stats: COMPLETE Column
stats: NONE
---
>                   Statistics: Num rows: 10 Data size: 1240 Basic stats: COMPLETE Column
stats: NONE

What would cause such statistics diffs? The Parquet file is created as:

create table if not exists alltypes_parquet (
  cint int,
  ctinyint tinyint,
  csmallint smallint,
  cfloat float,
  cdouble double,
  cstring1 string) stored as parquet;

insert overwrite table alltypes_parquet
  select cint,
    ctinyint,
    csmallint,
    cfloat,
    cdouble,
    cstring1
  from alltypesorc;

Note that there are no diffs in the actual query results.

Thanks,
~Remus

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message