pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheeba George <sheeba.geo...@gmail.com>
Subject Re: Writing filter function that takes constructor param?
Date Thu, 02 Dec 2010 08:48:14 GMT
Hi Daniel
  I have a related question. My UDF has a constructor that takes 2 param.
*

public* TopUDF(*int* top, *int* type){

m_cnt = top;

m_type = type;

}



But when I call instantiate using the below, I get error. Am I doing
something wrong?

define AggregateOthers com.ebay.ewa2.pig.load.TopUDF(3,0);
Thanks
Sheeba

On Tue, Nov 30, 2010 at 1:45 PM, Zach Bailey <zach.bailey@dataclip.com>wrote:

>
>  Thanks Daniel. Of course, you are right. Turns out I had a bug elsewhere
> in my UDF that was making me think this was not working correctly. After
> fixing that bug the "define ..." works fine.
>
>
> Using the following works great:
>
>
> define INITIALIZED_UDF com.my.udfs.UDF(constructor_params)
>
> Thanks,
> Zach
>
>
> On Tuesday, November 30, 2010 at 4:40 PM, Daniel Dai wrote:
>
> > Pig always instantiate UDF using the construct parameter defined in
> > "define" statement. ". CONTAINS_STRINGS(haystack) only pass haystack to
> > CONTAINS_STRINGS.exec(). It will not re-initializing the UDF.
> >
> > Daniel
> >
> > Zach Bailey wrote:
> >
> > >  I am trying to do what seems like should be a simple task using pig
> and a UDF I have written but can't seem to figure out the syntax to get it
> working.
> > >
> > >
> > >  At a high level I have a UDF that takes a number of strings that I
> then want to see if exist in some other strings. So I write a UDF called
> CONTAINS_ANY that I would like to initialize with the "needles" and then use
> pig to distribute this search out via hadoop.
> > >
> > >
> > >  The problem is I can't figure out what the correct syntax is to
> initialize the UDF with the "needles" and then use the UDF later once it has
> been initialized. I have tried the following syntax:
> > >
> > >
> > >  define CONTAINS_STRINGS
> com.my.piggybank.CONTAINS_ANY('string1|string2');
> > >
> > >
> > >  and then invoking this by doing
> > >
> > >
> > >  filtered = FILTER data BY CONTAINS_STRINGS(haystack);
> > >
> > >
> > >  but this ends up re-initializing the UDF with the strings from the
> haystack which is not what I wanted.
> > >
> > >
> > >  Essentially I want to be able to write a UDF that is like the built-in
> MATCHES function so I can say something like:
> > >
> > >
> > >  filtered = FILTER data by haystack
> CONTAINS_STRINGS('string1|string2');
> > >
> > >
> > >  but so far have been unable to find any useful/relevant documentation
> on how to accomplish this.
> > >
> > >
> > >  Thanks a lot for any pointers or help anyone can give.
> > >
> > >
> > >  Best,
> > >  Zach
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>


-- 
Sheeba Ann George

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message