hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-6166) JsonSerDe is too strict about table schema
Date Thu, 09 Jan 2014 09:35:50 GMT


Hive QA commented on HIVE-6166:

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4905 tests executed
*Failed tests:*

Test results:
Console output:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed

This message is automatically generated.


> JsonSerDe is too strict about table schema
> ------------------------------------------
>                 Key: HIVE-6166
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Serializers/Deserializers
>    Affects Versions: 0.12.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-6166.2.patch, HIVE-6166.patch
> JsonSerDe is too strict when it comes to schema, erroring out if it finds a subfield
with a key name that does not map to an appropriate type/schema of a table, or an inner-struct
> Thus, if a schema specifies "s:struct<a:int,b:string>,k:int" and we pass it data
that looks like the following:
> {noformat}
> { "x" : "abc" , "s" : { "a" : 2 , "b" : "blah", "c": "woo" } }
> {noformat}
> This should still pass, and the record should be read as if it were 
> {noformat}
> { "s" : { "a" : 2 , "b" : "blah"}, k :  null }
> {noformat}
> This will allow the JsonSerDe to be used with a wider set of data where the data does
not map too finely to the declared table schema.
> Note, we are still strict about a couple of things:
> a) If there is a declared schema column, then the type cannot vary, that is still considered
an error. i.e., if the hive table schema says k1 is a boolean, it cannot magically change
into an int or a struct, say, for eg.
> b) The JsonSerDe still attempts to map hive internal column names - i.e. if the data
contains a column named "_col2", then, if "_col2" is not declared directly in the schema,
it will map to column position 2 in that schema/subschema, rather than ignoring the field.
This is so that tables created with CTAS will still work. 

This message was sent by Atlassian JIRA

View raw message