hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltan Haindrich <k...@rxd.hu>
Subject Re: Review Request 69019: HIVE-20617 Fix type of constants in IN expressions to have correct type
Date Fri, 19 Oct 2018 19:26:19 GMT


> On Oct. 19, 2018, 9:54 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
> > Line 1172 (original), 1181 (patched)
> > <https://reviews.apache.org/r/69019/diff/1/?file=2097832#file2097832line1195>
> >
> >     This TODO will be good to resolve.
> >     We have type already so we can return null constant of appropriate type here,
no?

sure; seemed unrelated - and I'm was not sure if it would havock in the q.out-s or not :)
I hope not - fixed


> On Oct. 19, 2018, 9:54 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
> > Lines 1292-1294 (original), 1416-1418 (patched)
> > <https://reviews.apache.org/r/69019/diff/1/?file=2097832#file2097832line1434>
> >
> >     Not sure this logic is incorrect. We shall not coerce the length of constant
to be same as length of type. 
> >     
> >     Comparisons for 2 char types happens on stripped values. So, if constant is
of smaller length then this probably won't be a problem but is unnecessary. However, if constant
is longer, looks like HiveChar will truncate it and then comparison likely will be wrong.
Better is to create constant char of same length as of string.

I've added a method which returns null if the value is not represntable as the target char(n)
type. I feel that it also solves this problem.


> On Oct. 19, 2018, 9:54 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/queries/clientpositive/in_typecheck1.q
> > Lines 7 (patched)
> > <https://reviews.apache.org/r/69019/diff/1/?file=2097833#file2097833line7>
> >
> >     Add: 
> >     select * from ax where t = 'a         ';
> >     select * from ax where t = 'a          ';
> >     select * from ax where t = 'a          d';
> >     
> >     RHS constant is of length 10,11,12. I expect first and second to return 2 rows,
while third to return 0 rows.
> >     
> >     When t is varchar all 3 should return 0 rows.

there was an issue with the last statement:

`select * from ax where t = 'a          d'` was returning 2 rows even without the current
patch - fixed.


> On Oct. 19, 2018, 9:54 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/in_typecheck1.q.out
> > Lines 42 (patched)
> > <https://reviews.apache.org/r/69019/diff/1/?file=2097838#file2097838line42>
> >
> >     This doesn't look correct. For char/varchar comparisons doesn't preserve trailing
spaces. e.g., 
> >     create table t1 (a char(3), b varchar(4));
> >     insert into t1 values ('a','b');
> >     select * from t1 where b = 'b';
> >     a	b
> >     select * from t1 where a = 'a';
> >     a   b
> >     
> >     Got this one on both postgres and oracle.
> >     whereas this change would return empty results in Hive i believe

Yeah these are somewhat odd at first...I think the same codepath is in action which is during
printout.

The comparision is done without spaces; all the char/varchar test now seem to be doiung good
- so I think we don't have any incorrect results.

Actually psql also displays `char(3)` with 2 trailing whitespaces if the value is for example
'a'.


> On Oct. 19, 2018, 9:54 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/join45.q.out
> > Line 717 (original), 717 (patched)
> > <https://reviews.apache.org/r/69019/diff/1/?file=2097841#file2097841line717>
> >
> >     comparison of str and integer is done in double. See e.g, infer_const_type.q.out
where there is (UDFToDouble(str) = 1234.0D
> >     However, inside IN we are now casting constant to string. 
> >     Granted casting to double in this scenario is debatable but Hive had this behavior
since beginning and it need to be consistent inside IN or with direct comparison.

restored original behaviour; interestingly - this was the sideeffect that I've set a new arraylist
with the same contet to a desc...it might probably worth investigating later


- Zoltan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69019/#review209778
-----------------------------------------------------------


On Oct. 15, 2018, 8:05 a.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69019/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2018, 8:05 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20617
>     https://issues.apache.org/jira/browse/HIVE-20617
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For IN expressions the types were never corrected; and pointlookupoptimizer was probably
leaving behind fields already which were uncomparable; HIVE-20296 exposed it further by changing
the minimal number from  32 to 2.
> 
> This change generalizes the retyping of constants to also run it for the IN operator
; and also for struct-s.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 4968d16876c5c9cc36ec9a3ec48c2740c2c67dcd

>   ql/src/test/queries/clientpositive/in_typecheck1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_partition_coltype.q.out 5727f0a65c6e4736f41017e4e962d932dedbd6bd

>   ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out 43cb5ab89fdebde8be168d7837d8e54a38f4d10b

>   ql/src/test/results/clientpositive/cbo_simple_select.q.out 2073c6b802a1ae0ff4228a86f18ec366ff92ab02

>   ql/src/test/results/clientpositive/in_typecheck1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_const_type.q.out 4129bd0c715635eb83c0c0d248eb43a5779c7be9

>   ql/src/test/results/clientpositive/join45.q.out 77dbaa2cd8b5be7158545c696b30dc1068238f91

>   ql/src/test/results/clientpositive/join47.q.out 2536f7f4b6e9295d1177632b7f32f0b66974e3a4

>   ql/src/test/results/clientpositive/llap/cbo_simple_select.q.out e61300b5c853eb733d4443c047344e3fc6fe0ff3

>   ql/src/test/results/clientpositive/llap/dec_str.q.out 3b7f92d735db79f7c4c0d96debe7fff8e3c05f11

>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out bc1f97dd49ccc905d5e32d5a02a62bb692d444a6

>   ql/src/test/results/clientpositive/llap/lineage3.q.out cf388161272002ad6097839ceaea2bcfbcf9b7ef

>   ql/src/test/results/clientpositive/llap/mapjoin_hint.q.out 3c6270a05240097e9645e664ed30e0568052d98e

>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out feadbcd874818a78e2cc30b86cdebc1e4cb6a04f

>   ql/src/test/results/clientpositive/llap/vectorization_13.q.out 398cb56915f1b24b7c4dc325b60cb114d7ff2b8c

>   ql/src/test/results/clientpositive/llap/vectorization_6.q.out 70542ac7bd69e46098cf8158cae347e6c896c5b2

>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 662409d4f148cc3da5c4f788ddf59c6f40ede572

>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out a59a586144fc9dc14ac2fa87177c189feae47402

>   ql/src/test/results/clientpositive/mapjoin47.q.out c42094d7858fa70626a5184485a13fdacd45be7c

>   ql/src/test/results/clientpositive/parquet_vectorization_13.q.out e60548cb779826c0e50b5086cabe0b1408f4f182

>   ql/src/test/results/clientpositive/parquet_vectorization_6.q.out 85b075666f9cdcd15b2adbab07d665b8def863c0

>   ql/src/test/results/clientpositive/parquet_vectorization_8.q.out c089aab12554caa6ea38d1a0d90abf0c050b8ba0

>   ql/src/test/results/clientpositive/ppd_udf_col.q.out dfc2d0465d49f7c77452113d3791104a05aa42ef

>   ql/src/test/results/clientpositive/spark/cbo_simple_select.q.out e61300b5c853eb733d4443c047344e3fc6fe0ff3

>   ql/src/test/results/clientpositive/spark/parquet_vectorization_13.q.out 78a2428cfbd508c768100896955b8853a9bd2a50

>   ql/src/test/results/clientpositive/spark/parquet_vectorization_6.q.out 362d19c39b60ec2193fc73098787723042f53aab

>   ql/src/test/results/clientpositive/spark/parquet_vectorization_8.q.out b10b550009d383f84f19ac72207a9c94b12c497d

>   ql/src/test/results/clientpositive/spark/subquery_scalar.q.out af325200ba7ae7aa5ab0aa42ecf714eaa4afb122

>   ql/src/test/results/clientpositive/spark/vectorization_13.q.out a49738ecaac6324d529954ca14b952b4898e33dd

>   ql/src/test/results/clientpositive/spark/vectorization_6.q.out 8f7d3f2530da2afc10e8dfb23339568279d8c454

>   ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out 3844c79e1a3e1d37b8b93e4f39bc9b52992888f3

>   ql/src/test/results/clientpositive/vectorization_13.q.out d2f34481371122b2d6a33578a68310b739eb023f

>   ql/src/test/results/clientpositive/vectorization_6.q.out 99d917fdfa101cba42a1f91b5ac39d9401584a7d

>   ql/src/test/results/clientpositive/vectorization_8.q.out b9b0c8af17b13cebb2301012fadf3395786b4f62

> 
> 
> Diff: https://reviews.apache.org/r/69019/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message