hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-10722) external table creation in Hive can create unusable partition
Date Mon, 18 May 2015 23:20:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-10722:
------------------------------------
    Description: 
There can be directories in HDFS containing unprintable characters; when doing hadoop fs -ls,
these characters are not even visible, and can only be seen for example if output is piped
thru od.
When these are loaded via msck, they are stored in e.g. mysql as "?" (literal question mark,
findable via LIKE '%?%' in db) and show accordingly in Hive.
However, datanucleus appears to encode it as %3F; this causes the partition to be unusable
- it cannot be dropped, and other operations like drop table get stuck (didn't investigate
in detail why; drop table got unstuck as soon as the partition was removed from metastore).

We should probably have a 2-way option for such cases - error out on load (default), or convert
to '?'/drop such characters (and have partition that actually works, too).

We should also check if partitions with '?' inserted explicitly work at all with datanucleus.

  was:
There can be directories in HDFS containing unprintable characters; when doing hadoop fs -ls,
these characters are not even visible, and can only be seen for example if output is piped
thru od.
When these are loaded, they are stored in e.g. mysql as "?" (literal question mark, findable
via LIKE '%?%' in db) and show accordingly in Hive.
However, datanucleus appears to encode it as %3F; this causes the partition to be unusable
- it cannot be dropped, and other operations like drop table get stuck (didn't investigate
in detail why; drop table got unstuck as soon as the partition was removed from metastore).

We should probably have a 2-way option for such cases - error out on load (default), or convert
to '?'/drop such characters (and have partition that actually works, too).

We should also check if partitions with '?' inserted explicitly work at all with datanucleus.


> external table creation in Hive can create unusable partition
> -------------------------------------------------------------
>
>                 Key: HIVE-10722
>                 URL: https://issues.apache.org/jira/browse/HIVE-10722
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.1, 1.0.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>
> There can be directories in HDFS containing unprintable characters; when doing hadoop
fs -ls, these characters are not even visible, and can only be seen for example if output
is piped thru od.
> When these are loaded via msck, they are stored in e.g. mysql as "?" (literal question
mark, findable via LIKE '%?%' in db) and show accordingly in Hive.
> However, datanucleus appears to encode it as %3F; this causes the partition to be unusable
- it cannot be dropped, and other operations like drop table get stuck (didn't investigate
in detail why; drop table got unstuck as soon as the partition was removed from metastore).
> We should probably have a 2-way option for such cases - error out on load (default),
or convert to '?'/drop such characters (and have partition that actually works, too).
> We should also check if partitions with '?' inserted explicitly work at all with datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message