pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Re: Static variable in PIG UDF
Date Thu, 01 Mar 2012 16:39:51 GMT
I would be a bit clearer what you want to do. It's kind of vague, and the
solution will vary depending. Keep in mind that in M/R (and thus in Pig),
you have no idea how many records will be given to a given mapper, and
thus, a given instance of your UDF in a given JVM. For all you know, it
could be 1 instance per row...this'd be super inefficient, but it'd mean
that there'd be no point in storing anything statically!

There are ways to guarantee that all of the data will stream through one
UDF, namely, grouping, whether it be group all, or grouping by a key. In
this situation, though, you don't need static variables, as on a given
mapper the same instance of the class will do all of the processing. It
depends on your data and what you want to do on it, of course. But static
variables will not let you "communicate" between instances that are
processing data, because they are in different JVM's.

2012/3/1 Shibu Thomas <shibut@microsoft.com>

> Hi,
> I am trying to use a static variable in PIG UDF which will be invoked from
> a foreach statement
> This static variable will be used an index into an array to return the
> next value from the array.
> I want to understand the implications of the same
> Thanks
> Shibu Thomas
> Office :  +91 (40) 669 32660
> Mobile: +91 95811 51116

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message