hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Pritchard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable
Date Tue, 19 Jan 2016 07:20:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106370#comment-15106370
] 

Charles Pritchard commented on HIVE-10888:
------------------------------------------

I'm seeing a similar issue, in Hive 0.14. I have a two-level partition -- partitioned by (date
string, bucket string) and it seems that most queries do not include the default partition
(for bucket) when run. While I can run  create temp table as select *, and get a fully functioning
table, I can not simply run select * where, and get useable results from the default partition,
when I have a where query.

This may be a regression introduced in HIVE-4878. I'll check through some support channels
to see what I can find. 

> Hive Dynamic Partition + Default Partition makes Null Values Not querable
> -------------------------------------------------------------------------
>
>                 Key: HIVE-10888
>                 URL: https://issues.apache.org/jira/browse/HIVE-10888
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Query Processor
>            Reporter: Goden Yao
>
> This is reported by Pivotal.io (Noa Horn)
> And HAWQ latest version should have this fixed in our queries.
> === Expected Behavior ===
> When dynamic partition enabled and mode = nonstrict, the null value in the default partition
should still be returned when user specify that in "...WHERE.... is Null".
> === Problem statment ===
> *Enable dynamic partitions*
> {code}
> hive.exec.dynamic.partition = true
> hive.exec.dynamic.partition.mode = nonstrict
> #Get default partition name:
> hive.exec.default.partition.name
> Default Value: _HIVE_DEFAULT_PARTITION_
> {code}
> Hive creates a default partition if the partition key value doesn’t conform to the
field type. For example, if the partition key is NULL.
> *Hive Example*
> Add the following parameters to hive-site.xml
> {code}
>     	<property>
>             	<name>hive.exec.dynamic.partition</name>
>             	<value>true</value>
>     	</property>
>     	<property>
>             	<name>hive.exec.dynamic.partition.mode</name>
>             	<value>true</value>
>     	</property>
> {code}
> Create data:
> vi /tmp/base_data.txt
> 1,1.0,1900-01-01
> 2,2.2,1994-04-14
> 3,3.3,2011-03-31
> 4,4.5,bla
> 5,5.0,2013-12-06
> Create hive table and load the data to it. This table is used to load data to the partition
table.
> {code}
> hive>
> CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
> SELECT * FROM base;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 4    4.5    NULL
> 5    5.0    2013-12-06
> {code}
> Note that one of the rows has NULL in its date field.
> Create hive partition table and load data from base table to it. The data will be dynamically
partitioned
> {code}
> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
> INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
> SELECT * FROM sales;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> 4    4.5    NULL
> {code}
> Check that the table has different partitions
> {code}
> hdfs dfs -ls /hive/warehouse/sales
> Found 5 items
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
> drwxr-xr-x   - nhorn supergroup      	0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
> {code}
> Hive queries with default partition
> Queries without a filter or with a filter on a different field returns the default partition
data:
> {code}
> hive> select * from sales;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> 4    4.5    NULL
> Time taken: 0.578 seconds, Fetched: 5 row(s)
> {code}
> Queries with a filter on the partition field omit the default partition data:
> {code}
> hive> select * from sales where date <> '2013-12-06';
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> Time taken: 0.19 seconds, Fetched: 3 row(s)
> hive> select * from sales where date is null;   	 
> OK
> Time taken: 0.035 seconds
> hive> select * from sales where date is not null;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> Time taken: 0.042 seconds, Fetched: 4 row(s)
> hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
> OK
> Time taken: 0.056 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message