hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goden Yao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable
Date Mon, 01 Jun 2015 23:55:17 GMT
Goden Yao created HIVE-10888:
--------------------------------

             Summary: Hive Dynamic Partition + Default Partition makes Null Values Not querable
                 Key: HIVE-10888
                 URL: https://issues.apache.org/jira/browse/HIVE-10888
             Project: Hive
          Issue Type: Bug
          Components: Hive, Query Processor
            Reporter: Goden Yao


This is reported by Pivotal.io (Noa Horn)
And HAWQ latest version should have this fixed in our queries.

=== Expected Behavior ===
When dynamic partition enabled and mode = nonstrict, the null value in the default partition
should still be returned when user specify that in "...WHERE.... is Null".

=== Problem statment ===
*Enable dynamic partitions*
{code}
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict
#Get default partition name:
hive.exec.default.partition.name
Default Value: _HIVE_DEFAULT_PARTITION_
{code}

Hive creates a default partition if the partition key value doesn’t conform to the field
type. For example, if the partition key is NULL.

*Hive Example*

Add the following parameters to hive-site.xml
{code}
    	<property>
            	<name>hive.exec.dynamic.partition</name>
            	<value>true</value>
    	</property>
    	<property>
            	<name>hive.exec.dynamic.partition.mode</name>
            	<value>true</value>
    	</property>
{code}

Create data:
vi /tmp/base_data.txt
1,1.0,1900-01-01
2,2.2,1994-04-14
3,3.3,2011-03-31
4,4.5,bla
5,5.0,2013-12-06

Create hive table and load the data to it. This table is used to load data to the partition
table.

{code}
hive>
CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
SELECT * FROM base;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
4    4.5    NULL
5    5.0    2013-12-06
{code}
Note that one of the rows has NULL in its date field.

Create hive partition table and load data from base table to it. The data will be dynamically
partitioned
{code}
CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
SELECT * FROM sales;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
4    4.5    NULL
{code}

Check that the table has different partitions
{code}
hdfs dfs -ls /hive/warehouse/sales
Found 5 items
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
{code}

Hive queries with default partition

Queries without a filter or with a filter on a different field returns the default partition
data:

{code}
hive> select * from sales;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
4    4.5    NULL
Time taken: 0.578 seconds, Fetched: 5 row(s)
{code}

Queries with a filter on the partition field omit the default partition data:
{code}
hive> select * from sales where date <> '2013-12-06';
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
Time taken: 0.19 seconds, Fetched: 3 row(s)


hive> select * from sales where date is null;   	 
OK
Time taken: 0.035 seconds

hive> select * from sales where date is not null;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
Time taken: 0.042 seconds, Fetched: 4 row(s)

hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
OK
Time taken: 0.056 seconds
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message