hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5105) HCatSchema.remove(HCatFieldSchema hcatFieldSchema) does not clean up fieldPositionMap
Date Fri, 16 Aug 2013 21:37:48 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742626#comment-13742626
] 

Sushanth Sowmyan commented on HIVE-5105:
----------------------------------------

+1, Thanks for the test as well, Eugene. :)
                
> HCatSchema.remove(HCatFieldSchema hcatFieldSchema) does not clean up fieldPositionMap
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-5105
>                 URL: https://issues.apache.org/jira/browse/HIVE-5105
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.12.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>             Fix For: 0.12.0
>
>         Attachments: HIVE-5105.patch
>
>
> org.apache.hcatalog.data.schema.HCatSchema.remove(HCatFieldSchema hcatFieldSchema) makes
the following call:
> fieldPositionMap.remove(hcatFieldSchema);
> but fieldPositionMap is of type Map<String, Integer> so the element is not getting
removed
> Here's a detailed comment from [~sushanth]
> The result is that that the name will not be removed from fieldPositionMap. This results
in 2 things:
> a) If anyone tries to append a field to a hcatschema after having removed that field,
it shouldn't fail, but it will.
> b) If anyone asks for the position of the removed field by name, it will still give the
position.
> Now, there is only one place in hcat code where we remove a field, and that is called
from HCatOutputFormat.setSchema, where we try to detect if the user specified partition column
names in the schema when they shouldn't have, and if they did, we remove it. Normally, people
do not specify this, and this check tends to be superfluous.
> Once we do this, we wind up serializing that new object (after performing some validations),
and this does appear to stay through the serialization (and eventual deserialization) which
is very worrying.
> However, we are luckily saved by the fact that we do not append that field to it at any
time(all appends in hcat code are done on newly initialized HCatSchema objects which have
had no removes done on them), and we don't ask for the position of something we do not expect
to be there(harder to verify for certain, but seems to be the case on inspection).
> The main part that gives me worry is that HCatSchema is part of our public interface
for HCat, in that M/R programs that use HCat can use it, and thus, they might have more interesting
usage patterns that are hitting this bug.
> I can't think of any currently open bugs that is caused by this because of the rarity
of the situation, but nevertheless, something we should fix immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message