hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Discrepancy in String matching between Teradata and HIVE
Date Fri, 27 Mar 2015 08:55:23 GMT
Hive is only PRO SQL compliance,

In hive the string comparisons work just like they would work in java

so in hive

"BUILDING" = "BUILDING"
"BUILDING " != "BUILDING" (extra space added)

On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh <sanjiv.is.on@gmail.com>
wrote:

> Hi,
>
> I can use rtrim function, i.e:
>
> select id,name,CONCAT ('"' , status , '"') from customer WHERE rtrim(status) = 'BUILDING'
LIMIT 2;
>
> But question raised what standard in string comparison Hive uses?
> According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
> <http://support.microsoft.com/en-us/kb/316626> for an article about it.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <nitinpawar432@gmail.com>
> wrote:
>
>> Hive does not manipulate data by its own, if your processing logic needs
>> the trimming of spaces then you can provide that in query.
>>
>>
>>
>> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sanjiv.is.on@gmail.com>
>> wrote:
>>
>>>
>>>   Hi All,
>>>
>>> I am getting into Hive and learning hive. I have customer table in
>>> teradata , used sqoop to extract complete table in hive which worked fine.
>>>
>>> See below customer table both in Teradata and HIVE.
>>>
>>> *In Teradata :*
>>>
>>>     select TOP 4 id,name,'"'||status||'"' from customer;
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>     2817987     Customer#002817987  "COMPLETE  "
>>>     2817984     Customer#002817984  "BUILDING  "
>>>
>>> *In HIVE :*
>>>
>>>     select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>     2817987     Customer#002817987  "COMPLETE  "
>>>     2817984     Customer#002817984  "BUILDING  "
>>>
>>> When I tried to fetch records from table customer with column matching
>>> which is of String type. I am getting different result for same query in
>>> different environment.
>>>
>>> See below query results..
>>>
>>> *In Teradata :*
>>>
>>>     select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>
>>> *In HIVE :*
>>>
>>>     select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING'
LIMIT 2;
>>>
>>>     ***<<No Result>>***
>>>
>>> It seems that teradata is doing trimming short of thing before actually
>>> comparing stating values. But Hive is matching strings as it is.
>>>
>>> Not sure, It is expected behaviour or bug or can be raised as
>>> enhancement.
>>>
>>> I see below possible solution:
>>>
>>>    - Convert into like operator expression with wildcard character
>>>    before and after
>>>
>>> Looking forward for your response on this. How can it be
>>> handled/achieved in hive.
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Mime
View raw message