hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <>
Subject [jira] [Commented] (HIVE-4100) Improve regex_replace UDF to allow non-ascii characters
Date Tue, 13 Aug 2013 08:52:48 GMT


Teddy Choi commented on HIVE-4100:

If we allow "\uffff" form, then "\UDFBa" in "hive\ql\udf\" may be parsed as a
unicode character. It is ambiguous.

How about this way?
{code}REGEXP_REPLACE(some_column, "[^\\u0000-\\uffff]", UNESCAPE_UNICODE("\ufffd")){code}
In this example, UNESCAPE_UNICODE is just a random name.
> Improve regex_replace UDF to allow non-ascii characters
> -------------------------------------------------------
>                 Key: HIVE-4100
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.10.0
>            Reporter: Mark Grover
>            Assignee: Mark Grover
> There have a been a few email threads on the user mailing list regarding regex_replace
UDF not supporting non-ASCII characters. We should validate that and improve the UDF to allow
it. Translate UDF will be a good reference since it does that by using code points instead
of characters

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message