hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Difference between like %A% and %a%
Date Fri, 24 May 2013 15:58:01 GMT
It is not as simple of a problem as you think. Mysql has the same problem
just most everyone uses a default charset and comparator.

http://www.bluebox.net/about/blog/2009/07/mysql_encoding/

You do you account for foreign characters like the a~ etc. is that > then A
and less then <


On Fri, May 24, 2013 at 11:41 AM, Dean Wampler <deanwampler@gmail.com>wrote:

> If backwards compatibility wasn't an issue, the hive code that implements
> LIKE could be changed to convert the fields and LIKE strings to lower case
> before comparing ;) Of course, there is overhead doing that.
>
> On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>> Also I am thinking that the rlike is based on regex and can be told to do
>> case insensitive matching.
>>
>>
>> On Fri, May 24, 2013 at 9:16 AM, Dean Wampler <deanwampler@gmail.com>wrote:
>>
>>> Hortonworks has announced plans to make Hive more SQL compliant. I
>>> suspect bugs like this will be addressed sooner or later. It will be
>>> necessary to handle backwards compatibility, but that could be handled with
>>> a hive property that enables one or the other behaviors.
>>>
>>> On Fri, May 24, 2013 at 8:07 AM, John Omernik <john@omernik.com> wrote:
>>>
>>>> I have mentioned this before, and I think this a big miss by the Hive
>>>> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
>>>> case sensitive. Thus when you have new users moving over to Hive, if they
>>>> see a command like "like" they will assume similarity (like many other SQL
>>>> like qualities) and thus false negatives may ensue.  Even though it's
>>>> different by default (I am ok with this ... I guess, my personal preference
>>>> is that it matches the defaults on other systems, and outside of that
>>>> (which I am, in in the end fine with, just grumbly :) ) give us the ability
>>>> to set that behavior in the hive-site.xml.  That way when an org realizes
>>>> that it is different, and their users are all getting false negatives, they
>>>> can just update the hive-site and fix the problem rather than have to
>>>> include it in training that may or may not work.  I've added this comment
>>>> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for fun.
:)
>>>>
>>>> Please? :)
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <deanwampler@gmail.com>wrote:
>>>>
>>>>> Your where clause looks at the abbreviation, requiring 'A', not the
>>>>> state name. You got the correct answer.
>>>>>
>>>>>
>>>>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <saigraph@yahoo.in> wrote:
>>>>>
>>>>>> But it should get more results for this:
>>>>>>
>>>>>> %a%
>>>>>>
>>>>>> than for
>>>>>>
>>>>>> %A%
>>>>>>
>>>>>> Please let me know if i am missing something.
>>>>>> Thanks
>>>>>> Sai
>>>>>>
>>>>>>
>>>>>>    ------------------------------
>>>>>>  *From:* Jov <amutu@amutu.com>
>>>>>> *To:* user@hive.apache.org; Sai Sai <saigraph@yahoo.in>
>>>>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>>>>> *Subject:* Re: Difference between like %A% and %a%
>>>>>>
>>>>>>
>>>>>> 2013/5/24 Sai Sai <saigraph@yahoo.in>
>>>>>>
>>>>>> abbreviation l
>>>>>>
>>>>>>
>>>>>> unlike MySQL, string in Hive is case sensitiveļ¼Œso '%A%' is not
equal
>>>>>> with '%a%'.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jov
>>>>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dean Wampler, Ph.D.
>>>>> @deanwampler
>>>>> http://polyglotprogramming.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dean Wampler, Ph.D.
>>> @deanwampler
>>> http://polyglotprogramming.com
>>
>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com

Mime
View raw message