Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3FE25200B0F for ; Fri, 17 Jun 2016 11:04:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3E847160A63; Fri, 17 Jun 2016 09:04:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8828B160A50 for ; Fri, 17 Jun 2016 11:04:06 +0200 (CEST) Received: (qmail 69174 invoked by uid 500); 17 Jun 2016 09:04:05 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 69143 invoked by uid 99); 17 Jun 2016 09:04:05 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2016 09:04:05 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 605A62C033A for ; Fri, 17 Jun 2016 09:04:05 +0000 (UTC) Date: Fri, 17 Jun 2016 09:04:05 +0000 (UTC) From: "David Nies (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 17 Jun 2016 09:04:07 -0000 David Nies created HIVE-14044: --------------------------------- Summary: Newlines in Avro maps cause external table to return corrupt values Key: HIVE-14044 URL: https://issues.apache.org/jira/browse/HIVE-14044 Project: Hive Issue Type: Bug Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 5.5.1) Reporter: David Nies When {{\n}} characters are contained in Avro files that are used as data bases for an external table, the result of {{SELECT}} queries may be corrupt. I encountered this error when querying hive both from {{beeline}} and from JDBC. h3. Steps to reproduce (used files are attached to ticket) # Create an {{.avro}} file that contains newline characters in a value of a map: {code} avro-tools fromjson --schema-file test.schema test.json > test.avro {code} # Copy {{.avro}} file to HDFS {code} hdfs dfs -copyFromLocal test.avro /some/location/ {code} # Create an external table in beeline containing this {{.avro}}: {code} beeline> CREATE EXTERNAL TABLE broken_newline_map ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/dnies/hive-test/broken-newline/db' TBLPROPERTIES ('avro.schema.literal'=' { "type" : "record", "name" : "myEntry", "namespace" : "myNamespace", "fields" : [ { "name" : "foo", "type" : "long" }, { "name" : "bar", "type" : { "type" : "map", "values" : "string" } } ] } '); {code} # Now, selecting may return corrupt results: {code} jdbc:my-server:10000/> select * from broken_newline_map; +-------------------------+---------------------------------------------------+--+ | broken_newline_map.foo | broken_newline_map.bar | +-------------------------+---------------------------------------------------+--+ | 1 | {"key2":"value2","key1":"value1\nafter newline"} | | 2 | {"key2":"new value2","key1":"new value"} | +-------------------------+---------------------------------------------------+--+ 2 rows selected (1.661 seconds) jdbc:hive2://my-server:10000/> select foo, map_keys(bar), map_values(bar) from broken_newline_map; +-------+------------------+-----------------------------+--+ | foo | _c1 | _c2 | +-------+------------------+-----------------------------+--+ | 1 | ["key2","key1"] | ["value2","value1"] | | NULL | NULL | NULL | | 2 | ["key2","key1"] | ["new value2","new value"] | +-------+------------------+-----------------------------+--+ 3 rows selected (28.05 seconds) {code} Obviously, the last result set contains corrupt entries (line 2). I also encountered this when doing this query with JDBC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)