Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC2CB1177C for ; Thu, 24 Jul 2014 23:34:52 +0000 (UTC) Received: (qmail 45724 invoked by uid 500); 24 Jul 2014 23:34:52 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 45657 invoked by uid 500); 24 Jul 2014 23:34:52 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 45644 invoked by uid 99); 24 Jul 2014 23:34:51 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 23:34:51 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id E06931DBADD; Thu, 24 Jul 2014 23:34:38 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1976363149583685945==" MIME-Version: 1.0 Subject: Review Request 23907: Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query. From: "XIAOBING ZHOU" To: "XIAOBING ZHOU" , "hive" Date: Thu, 24 Jul 2014 23:34:38 -0000 Message-ID: <20140724233438.4203.98599@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "XIAOBING ZHOU" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/23907/ X-Sender: "XIAOBING ZHOU" Reply-To: "XIAOBING ZHOU" X-ReviewRequest-Repository: hive-git --===============1976363149583685945== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23907/ ----------------------------------------------------------- Review request for hive. Repository: hive-git Description ------- When we put UTF-8 characters in where clause of a hive query the results are empty for "where content like '%?%'" and results contain all rows for "where content not like '%?%';" even when few rows contain this character. Steps to reproduce: 1. Save a file called data.txt in the root container. The contents of the files are as follows. 190 ?f??c??h?c? 899 d???geg??ea?eead?e 137 ??h?ge??g?? 21 ??e?c??d?? 767 ?c?g????????????? 281 ???aga?c?e?? 573 ??hc?b??????hc? 966 ????e?eb??c????ga?? 565 ????bb?ehd?ea?? 778 ?????bbea??????a? 363 gd?a?a?b??fg? 822 a???h?e?h?gac????b 338 b??ff?e?e?ba? 2. Execute the following queries to setup the table. a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' t' LOCATION '/hivetable'; b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable; 3. create a query file query.hql with following contents INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content like '%?%'; 4. even though few rows contains this character the output is empty. 5. change the contents of query.hql to INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content not like '%?%'; 6. The output contains all rows including those containing the given character. 7. Similar results are observed when using "where content = '?f??c??h?c?'; " 8. We get expected results when using "where content like '%a%'; " Diffs ----- cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 3cdedba Diff: https://reviews.apache.org/r/23907/diff/ Testing ------- Tested, resolved the issue. Thanks, XIAOBING ZHOU --===============1976363149583685945==--