Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ACA27100CB for ; Sat, 11 Jan 2014 07:57:05 +0000 (UTC) Received: (qmail 18488 invoked by uid 500); 11 Jan 2014 07:56:57 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 18429 invoked by uid 500); 11 Jan 2014 07:56:55 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 18420 invoked by uid 500); 11 Jan 2014 07:56:53 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 18416 invoked by uid 99); 11 Jan 2014 07:56:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Jan 2014 07:56:51 +0000 Date: Sat, 11 Jan 2014 07:56:51 +0000 (UTC) From: "Ashutosh Chauhan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-6166) JsonSerDe is too strict about table schema MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868704#comment-13868704 ] Ashutosh Chauhan commented on HIVE-6166: ---------------------------------------- +1 > JsonSerDe is too strict about table schema > ------------------------------------------ > > Key: HIVE-6166 > URL: https://issues.apache.org/jira/browse/HIVE-6166 > Project: Hive > Issue Type: Bug > Components: HCatalog, Serializers/Deserializers > Affects Versions: 0.12.0 > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > Attachments: HIVE-6166.2.patch, HIVE-6166.3.patch, HIVE-6166.patch > > > JsonSerDe is too strict when it comes to schema, erroring out if it finds a subfield with a key name that does not map to an appropriate type/schema of a table, or an inner-struct schema. > Thus, if a schema specifies "s:struct,k:int" and we pass it data that looks like the following: > {noformat} > { "x" : "abc" , "s" : { "a" : 2 , "b" : "blah", "c": "woo" } } > {noformat} > This should still pass, and the record should be read as if it were > {noformat} > { "s" : { "a" : 2 , "b" : "blah"}, k : null } > {noformat} > This will allow the JsonSerDe to be used with a wider set of data where the data does not map too finely to the declared table schema. > Note, we are still strict about a couple of things: > a) If there is a declared schema column, then the type cannot vary, that is still considered an error. i.e., if the hive table schema says k1 is a boolean, it cannot magically change into an int or a struct, say, for eg. > b) The JsonSerDe still attempts to map hive internal column names - i.e. if the data contains a column named "_col2", then, if "_col2" is not declared directly in the schema, it will map to column position 2 in that schema/subschema, rather than ignoring the field. This is so that tables created with CTAS will still work. -- This message was sent by Atlassian JIRA (v6.1.5#6160)