hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-13482) str_to_map function delimiters are regex
Date Wed, 27 Feb 2019 08:09:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-13482?focusedWorklogId=205040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205040
]

ASF GitHub Bot logged work on HIVE-13482:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Feb/19 08:08
            Start Date: 27/Feb/19 08:08
    Worklog Time Spent: 10m 
      Work Description: MichaelChirico commented on pull request #553: [HIVE-13482][UDF] Explicitly
define str_to_map args as regex
URL: https://github.com/apache/hive/pull/553
 
 
   Successor to https://github.com/apache/spark/pull/23888
   
   See discussion there for some more details about the Hive side of this, in particular [my
comment here](https://github.com/apache/spark/pull/23888#issuecomment-467742127) about existing
StackOverflow answers and [here](https://github.com/apache/spark/pull/23888#issuecomment-467747788):
   
   > My conclusion is that it's eminently ambiguous whether the _intended_ behavior in
either Hive or SparkSQL is to treat the delimiters as regular expressions.
   
   > BUT the behavior has been around for [8 years](https://github.com/apache/hive/commit/4f8294e578db449294a1186f0ac4efb041445dcb)
and at least going off of the SO answers, it seems to be accepted as "known" behavior so things
will probably break if we change it.
   
   Thus, this PR intends to solidify the interpretation of `delimiter1` and `delimiter2` as
regular expressions once and for all.
   
   If the non-regexp behavior is strongly desired, eventually there could be a `fixed: bool`
argument that behaves like the identically-named argument in R regular expression functions
like [`gsub`](http://astrostatistics.psu.edu/su07/R/html/base/html/grep.html) and [`strsplit`](http://astrostatistics.psu.edu/su07/R/html/base/html/strsplit.html)...
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 205040)
            Time Spent: 10m
    Remaining Estimate: 0h

> str_to_map function delimiters are regex
> ----------------------------------------
>
>                 Key: HIVE-13482
>                 URL: https://issues.apache.org/jira/browse/HIVE-13482
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 1.0.0
>            Reporter: Janick Bernet
>            Assignee: Jason Dere
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The two delimiters passed to the 'str_to_map' function are both interpreted as regular
expressions, which means that using the pipe ('|') as a delimiter will lead to very unexpected
results.
> This behaviour is the same for the closely related 'split' function, however that is
clearly documented in the function description (as per https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF).

> Either the documentation for 'str_to_map' should be updated to reflect that the delimiters
are both regular expressions, too, or the implementation should be changed to not interpret
them as regexes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message