hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4070) Like operator in Hive is case sensitive while in MySQL (and most likely other DBs) it's case insensitive
Date Sat, 25 May 2013 15:57:21 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667121#comment-13667121
] 

Edward Capriolo commented on HIVE-4070:
---------------------------------------

The risk, as I see it, is that there are already a large number of people depending on the
current behaviour. If we change the default that would change the results current users are
getting. Better that the new users learn how hive works, since they are learning anyway, then
break assumptions of current users. Most users do not want to have to heavily test before
upgrade, they want consistent behaviour between versions.

You suggestion to have a global or session level property is a good one. There are some cases
where I have thought about doing this. In general, it is not ideal because no other component
in hive works this way. Thus having a one-off configuration for handling how like statements
work is odd. Also the query is no longer self documenting. Based on how some parameter outside
the query is set, the system functions differently. Imagine if we had 10 such parameters could
the same query produce 100 different results based on permutations of properties?

For the most part, we model functionality in hive based on what mysql does. You will find
a lot of compatibility in how UDFs work and other language features.

There are many ways this can be dealt with, hive has 'like' and 'rlike'. If there is an sql
standard on how like must work that might be ammo for the argument of changing the default,
but basing a change solely on how mysql does something just for new users is not attractive.
MySQL has made its own bad choices over the years (non standard things like enum) (non standard
date/time types) (non standard ways to specify indexes).

I am guessing that hive's like is the way it is because hive initially only supported java's
UTf8 strings and that comparison is by default case sensitive.
                
> Like operator in Hive is case sensitive while in MySQL (and most likely other DBs) it's
case insensitive
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4070
>                 URL: https://issues.apache.org/jira/browse/HIVE-4070
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.10.0
>            Reporter: Mark Grover
>            Assignee: Mark Grover
>            Priority: Trivial
>
> Hive's like operator seems to be case sensitive.
> See https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L164
> However, MySQL's like operator is case insensitive. I don't have other DB's (like PostgreSQL)
installed and handy but I am guessing their LIKE is case insensitive as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message