From issues-return-151012-archive-asf-public=cust-asf.ponee.io@hive.apache.org Wed Feb 27 09:04:12 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 249D3180608 for ; Wed, 27 Feb 2019 10:04:11 +0100 (CET) Received: (qmail 73963 invoked by uid 500); 27 Feb 2019 09:04:11 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 73949 invoked by uid 99); 27 Feb 2019 09:04:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2019 09:04:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 92FB7C24BC for ; Wed, 27 Feb 2019 09:04:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.5 X-Spam-Level: X-Spam-Status: No, score=-109.5 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id uB1xNOMRT0rL for ; Wed, 27 Feb 2019 09:04:09 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 4C0A36246C for ; Wed, 27 Feb 2019 08:09:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7C014E27FA for ; Wed, 27 Feb 2019 08:09:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 30DF224573 for ; Wed, 27 Feb 2019 08:09:00 +0000 (UTC) Date: Wed, 27 Feb 2019 08:09:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Work logged] (HIVE-13482) str_to_map function delimiters are regex MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-13482?focusedWorklogId=205040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205040 ] ASF GitHub Bot logged work on HIVE-13482: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Feb/19 08:08 Start Date: 27/Feb/19 08:08 Worklog Time Spent: 10m Work Description: MichaelChirico commented on pull request #553: [HIVE-13482][UDF] Explicitly define str_to_map args as regex URL: https://github.com/apache/hive/pull/553 Successor to https://github.com/apache/spark/pull/23888 See discussion there for some more details about the Hive side of this, in particular [my comment here](https://github.com/apache/spark/pull/23888#issuecomment-467742127) about existing StackOverflow answers and [here](https://github.com/apache/spark/pull/23888#issuecomment-467747788): > My conclusion is that it's eminently ambiguous whether the _intended_ behavior in either Hive or SparkSQL is to treat the delimiters as regular expressions. > BUT the behavior has been around for [8 years](https://github.com/apache/hive/commit/4f8294e578db449294a1186f0ac4efb041445dcb) and at least going off of the SO answers, it seems to be accepted as "known" behavior so things will probably break if we change it. Thus, this PR intends to solidify the interpretation of `delimiter1` and `delimiter2` as regular expressions once and for all. If the non-regexp behavior is strongly desired, eventually there could be a `fixed: bool` argument that behaves like the identically-named argument in R regular expression functions like [`gsub`](http://astrostatistics.psu.edu/su07/R/html/base/html/grep.html) and [`strsplit`](http://astrostatistics.psu.edu/su07/R/html/base/html/strsplit.html)... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 205040) Time Spent: 10m Remaining Estimate: 0h > str_to_map function delimiters are regex > ---------------------------------------- > > Key: HIVE-13482 > URL: https://issues.apache.org/jira/browse/HIVE-13482 > Project: Hive > Issue Type: Improvement > Components: UDF > Affects Versions: 1.0.0 > Reporter: Janick Bernet > Assignee: Jason Dere > Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The two delimiters passed to the 'str_to_map' function are both interpreted as regular expressions, which means that using the pipe ('|') as a delimiter will lead to very unexpected results. > This behaviour is the same for the closely related 'split' function, however that is clearly documented in the function description (as per https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF). > Either the documentation for 'str_to_map' should be updated to reflect that the delimiters are both regular expressions, too, or the implementation should be changed to not interpret them as regexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)