pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mrid...@yahoo-inc.com>
Subject Re: Writing filter function that takes constructor param?
Date Thu, 02 Dec 2010 09:15:08 GMT

As of now, udf's are limited to only String's as constructor params.


Regards,
Mridul

On Thursday 02 December 2010 02:18 PM, Sheeba George wrote:
> Hi Daniel
>    I have a related question. My UDF has a constructor that takes 2 param.
> *
>
> public* TopUDF(*int* top, *int* type){
>
> m_cnt = top;
>
> m_type = type;
>
> }
>
>
>
> But when I call instantiate using the below, I get error. Am I doing
> something wrong?
>
> define AggregateOthers com.ebay.ewa2.pig.load.TopUDF(3,0);
> Thanks
> Sheeba
>
> On Tue, Nov 30, 2010 at 1:45 PM, Zach Bailey<zach.bailey@dataclip.com>wrote:
>
>>
>>   Thanks Daniel. Of course, you are right. Turns out I had a bug elsewhere
>> in my UDF that was making me think this was not working correctly. After
>> fixing that bug the "define ..." works fine.
>>
>>
>> Using the following works great:
>>
>>
>> define INITIALIZED_UDF com.my.udfs.UDF(constructor_params)
>>
>> Thanks,
>> Zach
>>
>>
>> On Tuesday, November 30, 2010 at 4:40 PM, Daniel Dai wrote:
>>
>>> Pig always instantiate UDF using the construct parameter defined in
>>> "define" statement. ". CONTAINS_STRINGS(haystack) only pass haystack to
>>> CONTAINS_STRINGS.exec(). It will not re-initializing the UDF.
>>>
>>> Daniel
>>>
>>> Zach Bailey wrote:
>>>
>>>>   I am trying to do what seems like should be a simple task using pig
>> and a UDF I have written but can't seem to figure out the syntax to get it
>> working.
>>>>
>>>>
>>>>   At a high level I have a UDF that takes a number of strings that I
>> then want to see if exist in some other strings. So I write a UDF called
>> CONTAINS_ANY that I would like to initialize with the "needles" and then use
>> pig to distribute this search out via hadoop.
>>>>
>>>>
>>>>   The problem is I can't figure out what the correct syntax is to
>> initialize the UDF with the "needles" and then use the UDF later once it has
>> been initialized. I have tried the following syntax:
>>>>
>>>>
>>>>   define CONTAINS_STRINGS
>> com.my.piggybank.CONTAINS_ANY('string1|string2');
>>>>
>>>>
>>>>   and then invoking this by doing
>>>>
>>>>
>>>>   filtered = FILTER data BY CONTAINS_STRINGS(haystack);
>>>>
>>>>
>>>>   but this ends up re-initializing the UDF with the strings from the
>> haystack which is not what I wanted.
>>>>
>>>>
>>>>   Essentially I want to be able to write a UDF that is like the built-in
>> MATCHES function so I can say something like:
>>>>
>>>>
>>>>   filtered = FILTER data by haystack
>> CONTAINS_STRINGS('string1|string2');
>>>>
>>>>
>>>>   but so far have been unable to find any useful/relevant documentation
>> on how to accomplish this.
>>>>
>>>>
>>>>   Thanks a lot for any pointers or help anyone can give.
>>>>
>>>>
>>>>   Best,
>>>>   Zach
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>


Mime
View raw message