hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Justin Leet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
Date Tue, 30 Dec 2014 17:32:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261275#comment-14261275
] 

Justin Leet commented on HIVE-7898:
-----------------------------------

This actually already happens in my patch. HCatStorer will abort with an error: e.g. "Field
named <field> already exists".  This isn't specifically in HCatBaseStorer, it actually
occurs during the conversion from Pig Schema to HCatSchema in convertPigSchemaToHCatSchema().
The modified getColFromSchema will pass the now truncated name, so convertPigSchemaToHCatSchema()
will attempt to add the now duplicated column and HCat won't allow the duplicated field to
go through.

> HCatStorer should ignore namespaces generated by Pig
> ----------------------------------------------------
>
>                 Key: HIVE-7898
>                 URL: https://issues.apache.org/jira/browse/HIVE-7898
>             Project: Hive
>          Issue Type: Improvement
>          Components: HCatalog
>    Affects Versions: 0.13.1
>            Reporter: Justin Leet
>            Assignee: Justin Leet
>            Priority: Minor
>         Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for HCatStorer to
be successful.  However, several Pig operations prepend a namespace to the alias in order
to differentiate fields (e.g. after a group with field b, you might have A::b).  In this case,
even if the fields are in the right order and the alias without namespace matches, the store
will fail because it tries to match the long form of the alias, despite the namespace being
extraneous information in this case.   Note that multiple aliases can be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, `A::b` as
has been introduced in Hive 13.  However, a different function call only validates Pig aliases
if they follow the old rules for Hive columns.  As such, a direct change (rather than attempting
to match either the namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message