hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Malak (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema
Date Thu, 14 Feb 2013 22:57:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578783#comment-13578783
] 

Michael Malak commented on HIVE-3528:
-------------------------------------

Sean:

OK, I've researched the problem further.

There is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work when I tested it locally.  But in that test, the query gets all of its data
from a test table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1,
float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because
column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

Note, though, that substituting a hard-coded null for string1 (and restoring struct1 to the
query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1,
float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

I will be entering an all-new JIRA for this.

                
> Avro SerDe doesn't handle serializing Nullable types that require access to a Schema
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-3528
>                 URL: https://issues.apache.org/jira/browse/HIVE-3528
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>              Labels: avro
>             Fix For: 0.11.0
>
>         Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt
>
>
> Deserialization properly handles hiding Nullable Avro types, including complex types
like record, map, array, etc. However, when Serialization attempts to write out these types
it erroneously makes use of the UNION schema that contains NULL and the other type.
> This results in Schema mis-match errors for Record, Array, Enum, Fixed, and Bytes.
> Here's a [review board of unit tests that express the problem|https://reviews.apache.org/r/7431/],
as well as one that supports the case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message