hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tongjie Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6783) Incompatible schema for maps between parquet-hive and parquet-pig
Date Mon, 31 Mar 2014 22:57:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955845#comment-13955845
] 

Tongjie Chen commented on HIVE-6783:
------------------------------------

The fix presented in this jira is to tag map type with the appropriate OriginalType.

The reason hive is backward compatible is that when parquet-hive converts parquet fields map.key
and map.value back to hive map column, it does not check MAP_KEY_VALUE OriginalType.  Also,
the equals method of GroupType does not check OriginalType at all, hence hive_schema and pig_schema
shown in description section are treated as equal (the only different is OriginalType).

However, parquet-pig's PigSchemaConverter checks to make sure map's OriginalType is of correct
type, hence it breaks when it reads Hive's map.

With the fix, pig now can read hive's map since hive converts it right.


> Incompatible schema for maps between parquet-hive and parquet-pig
> -----------------------------------------------------------------
>
>                 Key: HIVE-6783
>                 URL: https://issues.apache.org/jira/browse/HIVE-6783
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 0.13.0
>            Reporter: Tongjie Chen
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6783.1.patch.txt, HIVE-6783.2.patch.txt, HIVE-6783.3.patch.txt,
HIVE-6783.4.patch.txt
>
>
> see also in following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/290
> The schema written for maps isn't compatible between hive and pig. This means any files
written in one cannot be properly read in the other.
> More specifically,  for the same map column c1, parquet-pig generates schema:
> message pig_schema {
>   optional group c1 (MAP) {
>     repeated group map (MAP_KEY_VALUE) {
>       required binary key (UTF8);
>       optional binary value;
>     }   
>   }
> }
> while parquet-hive generates schema:
> message hive_schema {
>    optional group c1 (MAP_KEY_VALUE) {
>      repeated group map {
>        required binary key;
>        optional binary value;
>    }
>  }
> }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message