hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16247) SparkSQL Avro serialization doesn't handle enums correctly
Date Fri, 17 Mar 2017 17:23:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930336#comment-15930336
] 

Jerry He commented on HBASE-16247:
----------------------------------

Just want to clarify a little more.  Hope I am right.  As soon as the hbase spark data source
implementation sees the Avro schema as a column, it converts the Avro types to Spark SQL types.
ENUM is mapped to StringType. After that, ENUM will just disappear throughout the catalyst
data flow, including serialization or deserialization in the hbase spark data source.

> SparkSQL Avro serialization doesn't handle enums correctly
> ----------------------------------------------------------
>
>                 Key: HBASE-16247
>                 URL: https://issues.apache.org/jira/browse/HBASE-16247
>             Project: HBase
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 2.0.0
>            Reporter: Sean Busbey
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> Avro's generic api expects GenericEnumSymbol as the runtime type for instances of fields
that are of Avro type ENUM. The Avro 1.7 libraries are lax in some cases for handling this,
but the 1.8 libraries are strict. We should proactively fix our serialization.
> (the lax serialization in 1.7 fails for some nested use in unions, see AVRO-997 for details)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message