Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C86817E54 for ; Sat, 25 Jul 2015 20:07:05 +0000 (UTC) Received: (qmail 43487 invoked by uid 500); 25 Jul 2015 20:07:04 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 43466 invoked by uid 500); 25 Jul 2015 20:07:04 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 43457 invoked by uid 99); 25 Jul 2015 20:07:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Jul 2015 20:07:04 +0000 Date: Sat, 25 Jul 2015 20:07:04 +0000 (UTC) From: "eugeny birukov (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-11373) Incorrect (de)serialization STRING field to MAP in TRANSFORM operation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-11373?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] eugeny birukov updated HIVE-11373: ---------------------------------- Description:=20 I try transform json string to Map using python code: for d in sys.stdin: r=3Dre.sub('[:,]', '\003', re.sub('[{}\"]','',d)) print r.strip() echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py=20 key1=03valu1=03key2=03value2 It's string must transform to HIVE type MAP But transformation result view as {"key1":"valu1\u0003key2\u0003value2"} With one key-value entry work fine: hive> SELECT TRANSFORM ('{"key1":"valu1"}') USING 's3://webgames-emr/hive/r= estore/json2map.py' AS (parsedjson MAP) FROM json; = =20 ... {"key1":"valu1"} Time taken: 35.177 seconds, Fetched: 1 row(s) With many key-value entry work incorrect: hive> SELECT TRANSFORM ('{"key1":"valu1","key2":"value2"}') USING 's3://web= games-emr/hive/restore/json2map.py' AS (parsedjson MAP) FRO= M json; ... {"key1":"valu1\u0003key2\u0003value2"} Time taken: 33.486 seconds, Fetched: 1 row(s) Steps for full reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/js= on.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/j= son2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID =3D hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs =3D 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job =3D job_1437833808701_0006, Tracking URL =3D http://ip-172-31-= 11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command =3D /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_00= 06 Hadoop job information for Stage-1: number of mappers: 1; number of reducer= s: 0 2015-07-25 15:01:16,773 Stage-1 map =3D 0%, reduce =3D 0% 2015-07-25 15:01:34,319 Stage-1 map =3D 100%, reduce =3D 0%, Cumulative CP= U 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job =3D job_1437833808701_0006 MapReduce Jobs Launched:=20 Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Writ= e: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} was: I try transform json string to Map using python code: for d in sys.stdin: r=3Dre.sub('[:,]', '\003', re.sub('[{}\"]','',d)) print r.strip() echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py=20 key1=03valu1=03key2=03value2 It's string must transform to HIVE type MAP But transformation result view as {"key1":"valu1\u0003key2\u0003value2"} Steps for reproduce: echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/js= on.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/j= son2map.py' AS (parsedjson MAP) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID =3D hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs =3D 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job =3D job_1437833808701_0006, Tracking URL =3D http://ip-172-31-= 11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command =3D /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_00= 06 Hadoop job information for Stage-1: number of mappers: 1; number of reducer= s: 0 2015-07-25 15:01:16,773 Stage-1 map =3D 0%, reduce =3D 0% 2015-07-25 15:01:34,319 Stage-1 map =3D 100%, reduce =3D 0%, Cumulative CP= U 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job =3D job_1437833808701_0006 MapReduce Jobs Launched:=20 Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Writ= e: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} > Incorrect (de)serialization STRING field to MAP in TRANSF= ORM operation > -------------------------------------------------------------------------= ------------- > > Key: HIVE-11373 > URL: https://issues.apache.org/jira/browse/HIVE-11373 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.1, 1.0.0 > Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with= HIVE 1.0) > Reporter: eugeny birukov > > I try transform json string to Map using python code: > for d in sys.stdin: > r=3Dre.sub('[:,]', '\003', re.sub('[{}\"]','',d)) > print r.strip() > echo '{"key1":"valu1","key2":"value2"}' | ./json2map.py=20 > key1=03valu1=03key2=03value2 > It's string must transform to HIVE type MAP > But transformation result view as {"key1":"valu1\u0003key2\u0003value2"= } > With one key-value entry work fine: > hive> SELECT TRANSFORM ('{"key1":"valu1"}') USING 's3://webgames-emr/hive= /restore/json2map.py' AS (parsedjson MAP) FROM json; = =20 > ... > {"key1":"valu1"} > Time taken: 35.177 seconds, Fetched: 1 row(s) > With many key-value entry work incorrect: > hive> SELECT TRANSFORM ('{"key1":"valu1","key2":"value2"}') USING 's3://w= ebgames-emr/hive/restore/json2map.py' AS (parsedjson MAP) F= ROM json; > ... > {"key1":"valu1\u0003key2\u0003value2"} > Time taken: 33.486 seconds, Fetched: 1 row(s) > Steps for full reproduce: > echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; > hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/= json.txt' overwrite into table json;" > hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore= /json2map.py' AS (parsedjson MAP) FROM json;" > converting to local s3://webgames-emr/hive/restore/json2map.py > Added resources: [s3://webgames-emr/hive/restore/json2map.py] > Query ID =3D hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722 > Total jobs =3D 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job =3D job_1437833808701_0006, Tracking URL =3D http://ip-172-3= 1-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ > Kill Command =3D /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_= 0006 > Hadoop job information for Stage-1: number of mappers: 1; number of reduc= ers: 0 > 2015-07-25 15:01:16,773 Stage-1 map =3D 0%, reduce =3D 0% > 2015-07-25 15:01:34,319 Stage-1 map =3D 100%, reduce =3D 0%, Cumulative = CPU 1.96 sec > MapReduce Total cumulative CPU time: 1 seconds 960 msec > Ended Job =3D job_1437833808701_0006 > MapReduce Jobs Launched:=20 > Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Wr= ite: 25 SUCCESS > Total MapReduce CPU Time Spent: 1 seconds 960 msec > OK > {"key1":"valu1\u0003key2\u0003value2"} > Time taken: 48.878 seconds, Fetched: 1 row(s) > Expected Result {"key1":"valu1","key2":"value2"} > Actual Result {"key1":"valu1\u0003key2\u0003value2"} -- This message was sent by Atlassian JIRA (v6.3.4#6332)