hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 丁桂涛(桂花) <dinggui...@baixing.com>
Subject Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Date Thu, 24 Jul 2014 07:52:55 GMT
Yeah. After setting hive.cache.expr.evaluation=false, all queries output
expected results.

And I found that it's related to the getDisplayString function in the UDF.
At first the function returns a string regardless of its parameters. And I
had to set hive.cache.expr.evaluation = false.

But after I changed the function to return string in depend of parameters,
all queries returned expected results even when the hive.cache.expr.evaluation
was set to true.

Thanks Navis. It really helps me a lot.

Best Regards,

Guitao


On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우 <navis.ryu@nexr.com> wrote:

> Looks like it's caused by HIVE-7314. Could you try that with
> "hive.cache.expr.evaluation=false"?
>
> Thanks,
> Navis
>
>
> 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) <dingguitao@baixing.com>:
>
> Yes. The output is correct: ["tp","p","sp"].
>>
>> I developed the UDF using JAVA in eclipse and exported the jar file into
>> the auxlib directory of hive. Then add the following line into the
>> ~/.hiverc file.
>>
>> create temporary function getad as 'xxxxxxx';
>>
>> The hive version is 0.12.0. Perhaps the problem resulted from the
>> mis-optimization of hive.
>>
>>
>> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <hellojinjie@gmail.com> wrote:
>>
>>> Have you tried this query without UDF, say:
>>>
>>>
>>> select
>>>   array(tp, p, sp) as ps
>>> from
>>>   (
>>>   select
>>>     'tp' as tp,
>>>     'p' as p,
>>>     'sp' as sp
>>>   from
>>>     table_name
>>>   where
>>>     id = xxxx
>>>   ) t;
>>>
>>>
>>> ​And how you implement the UDF?​
>>>
>>>
>>> 谢谢
>>> 金杰 (Jie Jin)
>>>
>>>
>>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <dingguitao@baixing.com>
wrote:
>>>
>>>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map
>>>> type and a string type parameter and outputs a string value. But I found
>>>> the UDF output really confusing in different conditions.
>>>>
>>>> Condition A:
>>>>
>>>>
>>>> select
>>>>   getad(map_col, 'tp') as tp,
>>>>   getad(map_col, 'p') as p,
>>>>   getad(map_col, 'sp') as sp
>>>> from
>>>>   table_name
>>>> where
>>>>   id = xxxx;
>>>>
>>>> The output is right: 'tp', 'p', 'sp'.
>>>>
>>>> Condition B:
>>>>
>>>>
>>>> select
>>>>   array(tp, p, sp) as ps
>>>> from
>>>>   (
>>>>   select
>>>>     getad(map_col, 'tp') as tp,
>>>>     getad(map_col, 'p') as p,
>>>>     getad(map_col, 'sp') as sp
>>>>   from
>>>>     table_name
>>>>   where
>>>>     id = xxxx
>>>>   ) t;
>>>>
>>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>>> the same result:
>>>>
>>>>
>>>> select
>>>>   array(
>>>>     getad(map_col, 'tp'),
>>>>     getad(map_col, 'p'),
>>>>     getad(map_col, 'sp')
>>>>   ) as ps
>>>> from
>>>>   table_name
>>>> where
>>>>   id = xxxx;
>>>>
>>>> Could you please provide me some hints on this? Thanks!
>>>>
>>>> --
>>>> 丁桂涛
>>>>
>>>
>>>
>>
>>
>> --
>> 丁桂涛
>>
>
>


-- 
丁桂涛

Mime
View raw message