hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
Date Fri, 25 Jul 2014 00:42:38 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Navis reopened HIVE-7511:
-------------------------


> Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select
query.
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7511
>                 URL: https://issues.apache.org/jira/browse/HIVE-7511
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>         Environment: Windows Server 2008 R2
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7511.1.patch
>
>
> When we put UTF-8 characters in where clause of a hive query the results are empty for
"where content like '%丄%'" and results contain all rows for "where content not like '%丄%';"
even when few rows contain this character.
> Steps to reproduce:
> 1. Save a file called data.txt in the root container. The contents of the files are as
follows.
> 190	丄f齄啊c狛䶴h䶴c狝
> 899	d狜狜㐁geg阿狚ea䶴eead狜e
> 137	齄鼾h狝ge㐀狛g狚阿
> 21	﨩﨩e㐀c狛鼾d䶴﨨
> 767	﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
> 281	﨨㐀啊aga啊c狝e鼾鼾
> 573	㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
> 966	䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
> 565	䶵㐀﨩㐀bb狛ehd丄ea丄㐀
> 778	﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
> 363	gd齄a鼾a䶴b㐁㐁fg鼾
> 822	a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
> 338	b齄㐁ff阿e狜e㐀ba齄
> 2. Execute the following queries to setup the table.
> a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED
BY '
> t' LOCATION '/hivetable';
> b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;
> 3. create a query file query.hql with following contents
> INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
> select * from hivetable where content like '%丄%';
> 4. even though few rows contains this character the output is empty.
> 5. change the contents of query.hql to 
> INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
> select * from hivetable where content not like '%丄%';
> 6. The output contains all rows including those containing the given character.
> 7. Similar results are observed when using "where content = '丄f齄啊c狛䶴h䶴c狝';
"
> 8. We get expected results when using "where content like '%a%'; "



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message