hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14265) Partial stats in Join operator may lead to data size estimate of 0
Date Mon, 18 Jul 2016 12:28:20 GMT
Jesus Camacho Rodriguez created HIVE-14265:
----------------------------------------------

             Summary: Partial stats in Join operator may lead to data size estimate of 0
                 Key: HIVE-14265
                 URL: https://issues.apache.org/jira/browse/HIVE-14265
             Project: Hive
          Issue Type: Bug
          Components: Statistics
            Reporter: Nita Dembla
            Assignee: Jesus Camacho Rodriguez


For some tables, we might not have the column stats available. However, if the table is partitioned,
we will have the stats for partition columns.

When we estimate the size of the data produced by a join operator, we end up using only the
columns that are available for the calculation e.g. partition columns in this case.

However, even in these cases, we should add the data size for those columns for which we do
not have stats (_default size for the column type x estimated number of rows_).

To reproduce, the following example can be used:

{noformat}
create table sample_partitioned (x int) partitioned by (y int);
insert into sample_partitioned partition(y=1) values (1),(2);
create temporary table sample as select * from sample_partitioned;
analyze table sample compute statistics for columns;

explain select sample_partitioned.x from sample_partitioned, sample where sample.y = sample_partitioned.y;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message