hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goden Yao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable
Date Thu, 14 Jul 2016 17:40:21 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Goden Yao updated HIVE-10888:
-----------------------------
    Description: 
This is reported by @hor
And HAWQ latest version should have this fixed in our queries.

=== Expected Behavior ===
When dynamic partition enabled and mode = nonstrict, the null value in the default partition
should still be returned when user specify that in "...WHERE.... is Null".

=== Problem statment ===
*Enable dynamic partitions*
{code}
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict
#Get default partition name:
hive.exec.default.partition.name
Default Value: _HIVE_DEFAULT_PARTITION_
{code}

Hive creates a default partition if the partition key value doesn’t conform to the field
type. For example, if the partition key is NULL.

*Hive Example*

Add the following parameters to hive-site.xml
{code}
    	<property>
            	<name>hive.exec.dynamic.partition</name>
            	<value>true</value>
    	</property>
    	<property>
            	<name>hive.exec.dynamic.partition.mode</name>
            	<value>true</value>
    	</property>
{code}

Create data:
vi /tmp/base_data.txt
1,1.0,1900-01-01
2,2.2,1994-04-14
3,3.3,2011-03-31
4,4.5,bla
5,5.0,2013-12-06

Create hive table and load the data to it. This table is used to load data to the partition
table.

{code}
hive>
CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
SELECT * FROM base;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
4    4.5    NULL
5    5.0    2013-12-06
{code}
Note that one of the rows has NULL in its date field.

Create hive partition table and load data from base table to it. The data will be dynamically
partitioned
{code}
CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
SELECT * FROM sales;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
4    4.5    NULL
{code}

Check that the table has different partitions
{code}
hdfs dfs -ls /hive/warehouse/sales
Found 5 items
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
{code}

Hive queries with default partition

Queries without a filter or with a filter on a different field returns the default partition
data:

{code}
hive> select * from sales;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
4    4.5    NULL
Time taken: 0.578 seconds, Fetched: 5 row(s)
{code}

Queries with a filter on the partition field omit the default partition data:
{code}
hive> select * from sales where date <> '2013-12-06';
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
Time taken: 0.19 seconds, Fetched: 3 row(s)


hive> select * from sales where date is null;   	 
OK
Time taken: 0.035 seconds

hive> select * from sales where date is not null;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
Time taken: 0.042 seconds, Fetched: 4 row(s)

hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
OK
Time taken: 0.056 seconds
{code}


  was:
This is reported by Pivotal.io (Noa Horn)
And HAWQ latest version should have this fixed in our queries.

=== Expected Behavior ===
When dynamic partition enabled and mode = nonstrict, the null value in the default partition
should still be returned when user specify that in "...WHERE.... is Null".

=== Problem statment ===
*Enable dynamic partitions*
{code}
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict
#Get default partition name:
hive.exec.default.partition.name
Default Value: _HIVE_DEFAULT_PARTITION_
{code}

Hive creates a default partition if the partition key value doesn’t conform to the field
type. For example, if the partition key is NULL.

*Hive Example*

Add the following parameters to hive-site.xml
{code}
    	<property>
            	<name>hive.exec.dynamic.partition</name>
            	<value>true</value>
    	</property>
    	<property>
            	<name>hive.exec.dynamic.partition.mode</name>
            	<value>true</value>
    	</property>
{code}

Create data:
vi /tmp/base_data.txt
1,1.0,1900-01-01
2,2.2,1994-04-14
3,3.3,2011-03-31
4,4.5,bla
5,5.0,2013-12-06

Create hive table and load the data to it. This table is used to load data to the partition
table.

{code}
hive>
CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
SELECT * FROM base;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
4    4.5    NULL
5    5.0    2013-12-06
{code}
Note that one of the rows has NULL in its date field.

Create hive partition table and load data from base table to it. The data will be dynamically
partitioned
{code}
CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
SELECT * FROM sales;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
4    4.5    NULL
{code}

Check that the table has different partitions
{code}
hdfs dfs -ls /hive/warehouse/sales
Found 5 items
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
{code}

Hive queries with default partition

Queries without a filter or with a filter on a different field returns the default partition
data:

{code}
hive> select * from sales;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
4    4.5    NULL
Time taken: 0.578 seconds, Fetched: 5 row(s)
{code}

Queries with a filter on the partition field omit the default partition data:
{code}
hive> select * from sales where date <> '2013-12-06';
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
Time taken: 0.19 seconds, Fetched: 3 row(s)


hive> select * from sales where date is null;   	 
OK
Time taken: 0.035 seconds

hive> select * from sales where date is not null;
OK
1    1.0    1900-01-01
2    2.2    1994-04-14
3    3.3    2011-03-31
5    5.0    2013-12-06
Time taken: 0.042 seconds, Fetched: 4 row(s)

hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
OK
Time taken: 0.056 seconds
{code}



> Hive Dynamic Partition + Default Partition makes Null Values Not querable
> -------------------------------------------------------------------------
>
>                 Key: HIVE-10888
>                 URL: https://issues.apache.org/jira/browse/HIVE-10888
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Query Processor
>            Reporter: Goden Yao
>
> This is reported by @hor
> And HAWQ latest version should have this fixed in our queries.
> === Expected Behavior ===
> When dynamic partition enabled and mode = nonstrict, the null value in the default partition
should still be returned when user specify that in "...WHERE.... is Null".
> === Problem statment ===
> *Enable dynamic partitions*
> {code}
> hive.exec.dynamic.partition = true
> hive.exec.dynamic.partition.mode = nonstrict
> #Get default partition name:
> hive.exec.default.partition.name
> Default Value: _HIVE_DEFAULT_PARTITION_
> {code}
> Hive creates a default partition if the partition key value doesn’t conform to the
field type. For example, if the partition key is NULL.
> *Hive Example*
> Add the following parameters to hive-site.xml
> {code}
>     	<property>
>             	<name>hive.exec.dynamic.partition</name>
>             	<value>true</value>
>     	</property>
>     	<property>
>             	<name>hive.exec.dynamic.partition.mode</name>
>             	<value>true</value>
>     	</property>
> {code}
> Create data:
> vi /tmp/base_data.txt
> 1,1.0,1900-01-01
> 2,2.2,1994-04-14
> 3,3.3,2011-03-31
> 4,4.5,bla
> 5,5.0,2013-12-06
> Create hive table and load the data to it. This table is used to load data to the partition
table.
> {code}
> hive>
> CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
> SELECT * FROM base;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 4    4.5    NULL
> 5    5.0    2013-12-06
> {code}
> Note that one of the rows has NULL in its date field.
> Create hive partition table and load data from base table to it. The data will be dynamically
partitioned
> {code}
> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
> INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
> SELECT * FROM sales;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> 4    4.5    NULL
> {code}
> Check that the table has different partitions
> {code}
> hdfs dfs -ls /hive/warehouse/sales
> Found 5 items
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
> {code}
> Hive queries with default partition
> Queries without a filter or with a filter on a different field returns the default partition
data:
> {code}
> hive> select * from sales;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> 4    4.5    NULL
> Time taken: 0.578 seconds, Fetched: 5 row(s)
> {code}
> Queries with a filter on the partition field omit the default partition data:
> {code}
> hive> select * from sales where date <> '2013-12-06';
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> Time taken: 0.19 seconds, Fetched: 3 row(s)
> hive> select * from sales where date is null;   	 
> OK
> Time taken: 0.035 seconds
> hive> select * from sales where date is not null;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> Time taken: 0.042 seconds, Fetched: 4 row(s)
> hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
> OK
> Time taken: 0.056 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message