hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang" <ctang...@gmail.com>
Subject Re: Review Request 33171: HIVE-10307:Support to use number literals in partition column
Date Fri, 17 Apr 2015 20:14:43 GMT


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 1235
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line1235>
> >
> >     Why do we need this checking? Does this mean we can't use the default partition
name for none dynamic partition?
> 
> Chaoyu Tang wrote:
>     No, that is basically to allow the default partition (from dynamic partition) automatically
skip the type check. Currently the default partition value from dynamic partition insert is
actually from HiveConf hive.exec.default.partition.name in type of string with default value
"__HIVE_DEFAULT_PARTITION__", but it is used for any type of partition column.
> 
> Jimmy Xiang wrote:
>     The node is also not put into the map, any side effect? The default name can change,
right? For example, before rename and after rename. How big will be the performance gain?

1. The astExprNodeMap is an empty Map which is passed to getPartExprNodeDesc to get all the
partitions whose value needs to be validated. For any partition with defaultPartitionName,
it should not be put into this map, so it won't get type checked in validatePartColumnType
method. 
In validatePartColumnType, there are actually two maps, partSpec which contains all the partVals
including defaultName, and astExprNodeMap which only contains the partVal nodes to be typechecked.
Use astExprNodeMap to convert any partVal in partSpec which needs typecheck/conversion.

The code might be quite hard to be read since it passes a collection as a parameter to a method,
and modifiy its element in the method.

2. yes, default name can be change (or set) through seting HiveConf property hive.exec.default.partition.name.
The defaultPartitonName is the result from dynamic partition insert. For example, during dynamic
partiton insert, if there is no value(null) found to be a partition column value, the system
will replace it with this defaultPartitionName. User usually won't rename a partition to the
defaultPartitonName.

3. Skipping type check for the defaultPartitionName is not mainly for the performance gain.
Since we know one defaultPartitionName is used in the system for differnt types of partition
column, it is meanless to check its type. For example, if the partition column type is int,
while the defaultParitionName is a string but not in the number format say HIVE_DEFAULT_PARTITION.
Typecheking this defaultPartitionName for type int will fail unless every time you turn the
validation off.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33171/#review80275
-----------------------------------------------------------


On April 16, 2015, 9:38 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33171/
> -----------------------------------------------------------
> 
> (Updated April 16, 2015, 9:38 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10307
>     https://issues.apache.org/jira/browse/HIVE-10307
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as literals with
postfix like Y, S, L, or BD appended to the number. These literals work in most Hive queries,
but do not when they are used as partition column value. This patch is to address the issue
of number literals used in partition specification.
> Highlights of the changes:
> 1. Validate, convert and normalize the partVal in partSpec to match its column type when
hive.partition.check.column.type is set to true (default). It not only applies to opertion
insert which used to be controlled by "hive.typecheck.on.insert", but also for other partition
operations (e.g. alter table .. partition, partition statistics etc). The hive.typecheck.on.insert
is now removed.
> 2. Convert and normalize legacy partition column data by using alter table partition
.. rename with hive.partition.check.old.column.type.in.rename set to true. this property only
allows the partVal in old PartSpec to skip the type check, conversion in partition rename.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e138800 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 19234b5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java e8066be

>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 8302067 
>   ql/src/test/queries/clientpositive/alter_partition_coltype.q 8c9945c 
>   ql/src/test/queries/clientpositive/partition_coltype_literals.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/partition_type_check.q c9bca99 
>   ql/src/test/results/clientnegative/archive_partspec1.q.out da4817c 
>   ql/src/test/results/clientnegative/archive_partspec5.q.out c18de52 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/partition_timestamp.q.out bc6ab10 
>   ql/src/test/results/clientpositive/partition_timestamp2.q.out 365df69 
> 
> Diff: https://reviews.apache.org/r/33171/diff/
> 
> 
> Testing
> -------
> 
> 1. Manaully tests covering various number literals (Y, S, L, BD)
> 2. new qfile test (partition_coltype_literals.q)
> 3. Precommit build
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message