hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang" <ctang...@gmail.com>
Subject Re: Review Request 33171: HIVE-10307:Support to use number literals in partition column
Date Fri, 17 Apr 2015 15:25:06 GMT


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1948
> > <https://reviews.apache.org/r/33171/diff/2/?file=930354#file930354line1948>
> >
> >     What will happen if user uses the old parameter?

I thought it might be better to hard stop using this property assuming there might not many
users are using it. Based on our previous conversation, I tired to make it coexist with new
property hive.partition.check.column.type for a while, but soon found the problem. Before
the property hive.typecheck.on.insert only applies to partition insert, if I still keep it,
it will actually be extended to control other partition operations (alter/describe), so if
we need disable the typecheck for an operation say describe, we need have to set both the
new and old properties to false. It will cause the confusion to users and the old one hard
to be retired in future. The code will become too convulted if we did not put the property
control in commom logic. What do you think?


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 979
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line979>
> >
> >     This function is not used any more?

Yes, there is no other reference to it in Hive. It is a protected method, but I do not think
any other external applications extending the BaseAnalyticSemantic class and using this method.
Initially I kept this API and make it have a delegate call to DDLAnalyticSemanic.getPartSpec,
but I thought it might not necessary. What do you think?


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 1235
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line1235>
> >
> >     Why do we need this checking? Does this mean we can't use the default partition
name for none dynamic partition?

No, that is basically to allow the default partition (from dynamic partition) automatically
skip the type check. Currently the default partition value from dynamic partition insert is
actually from HiveConf hive.exec.default.partition.name in type of string with default value
"__HIVE_DEFAULT_PARTITION__", but it is used for any type of partition column.


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 1279
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line1279>
> >
> >     It is changed from writable OI to java OI?

Actually the value stored in ExprNodeConstantDesc is Java type instead of Hadoop Writable
type (e.g. see TypeCheckProcFactory.NumExprProcessor), so we should use getStandardJavaObjectInspectorFromTypeInfo


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 1285
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line1285>
> >
> >     Will the output format be a little different in some case, due to the OI change?

For output OI, I believe either should be OK since we eventually convert its value to string
and put into partSpec<String, String>. Just to be consistent with the input OI.


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 1286
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line1286>
> >
> >     Two spaces after // for the comment will be great.

Changed the "//Since partVal is a constant, it is safe to cast ExprNodeDesc to ExprNodeConstantDesc."
to "//  Since partVal is a constant, it is safe to cast ExprNodeDesc to ExprNodeConstantDesc."
is it what you suggested?


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 1302
> > <https://reviews.apache.org/r/33171/diff/2/?file=930355#file930355line1302>
> >
> >     We don't normalize the column spec any more?

No! since the conversion has been done in the type convert. java.sql.Date takes any string
like "YYYY-[M]M-[D]D" and its toString always return YYYY-MM-DD the normalized format we want.


> On April 16, 2015, 6:51 p.m., Jimmy Xiang wrote:
> > ql/src/test/results/clientpositive/partition_timestamp.q.out, line 17
> > <https://reviews.apache.org/r/33171/diff/2/?file=930364#file930364line17>
> >
> >     Is the output change because the OI is changed from writable OI to java OI?

No, it is because of the test, for example in partition_timestamp, the query:
insert overwrite table partition_timestamp_1 partition(dt='2000-01-01 01:00:00', region= '1')
select * from src tablesample (10 rows);
it passes a string '2000-01-01 01:00:00' as the value to partition column of timestamp type,
when the string is converted to timestamp whose normalized value is 2000-01-01 01:00:00.0
(add 0 for the nano). so we have to modify the output file.

If the initial test was written as:
insert overwrite table partition_timestamp_1 partition(dt=timestamp '2000-01-01 01:00:00',
region= '1') select * from src tablesample (10 rows);
we would not have this problem in its q.out file.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33171/#review80275
-----------------------------------------------------------


On April 16, 2015, 9:38 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33171/
> -----------------------------------------------------------
> 
> (Updated April 16, 2015, 9:38 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10307
>     https://issues.apache.org/jira/browse/HIVE-10307
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as literals with
postfix like Y, S, L, or BD appended to the number. These literals work in most Hive queries,
but do not when they are used as partition column value. This patch is to address the issue
of number literals used in partition specification.
> Highlights of the changes:
> 1. Validate, convert and normalize the partVal in partSpec to match its column type when
hive.partition.check.column.type is set to true (default). It not only applies to opertion
insert which used to be controlled by "hive.typecheck.on.insert", but also for other partition
operations (e.g. alter table .. partition, partition statistics etc). The hive.typecheck.on.insert
is now removed.
> 2. Convert and normalize legacy partition column data by using alter table partition
.. rename with hive.partition.check.old.column.type.in.rename set to true. this property only
allows the partVal in old PartSpec to skip the type check, conversion in partition rename.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e138800 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 19234b5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java e8066be

>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 8302067 
>   ql/src/test/queries/clientpositive/alter_partition_coltype.q 8c9945c 
>   ql/src/test/queries/clientpositive/partition_coltype_literals.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/partition_type_check.q c9bca99 
>   ql/src/test/results/clientnegative/archive_partspec1.q.out da4817c 
>   ql/src/test/results/clientnegative/archive_partspec5.q.out c18de52 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/partition_timestamp.q.out bc6ab10 
>   ql/src/test/results/clientpositive/partition_timestamp2.q.out 365df69 
> 
> Diff: https://reviews.apache.org/r/33171/diff/
> 
> 
> Testing
> -------
> 
> 1. Manaully tests covering various number literals (Y, S, L, BD)
> 2. new qfile test (partition_coltype_literals.q)
> 3. Precommit build
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message