hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "XIAOBING ZHOU" <xzho...@gmail.com>
Subject Review Request 23907: Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
Date Thu, 24 Jul 2014 23:34:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23907/
-----------------------------------------------------------

Review request for hive.


Repository: hive-git


Description
-------

When we put UTF-8 characters in where clause of a hive query the results are empty for "where
content like '%?%'" and results contain all rows for "where content not like '%?%';" even
when few rows contain this character.

Steps to reproduce:

1. Save a file called data.txt in the root container. The contents of the files are as follows.

190	?f??c??h?c?
899	d???geg??ea?eead?e
137	??h?ge??g??
21	??e?c??d??
767	?c?g?????????????
281	???aga?c?e??
573	??hc?b??????hc?
966	????e?eb??c????ga??
565	????bb?ehd?ea??
778	?????bbea??????a?
363	gd?a?a?b??fg?
822	a???h?e?h?gac????b
338	b??ff?e?e?ba?

2. Execute the following queries to setup the table.
a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED
BY '
t' LOCATION '/hivetable';
b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;

3. create a query file query.hql with following contents

INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content like '%?%';

4. even though few rows contains this character the output is empty.

5. change the contents of query.hql to 

INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content not like '%?%';

6. The output contains all rows including those containing the given character.

7. Similar results are observed when using "where content = '?f??c??h?c?'; "

8. We get expected results when using "where content like '%a%'; "


Diffs
-----

  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 3cdedba 

Diff: https://reviews.apache.org/r/23907/diff/


Testing
-------

Tested, resolved the issue.


Thanks,

XIAOBING ZHOU


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message