pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Using dynamic invokers (InvokeForString)
Date Tue, 01 Mar 2011 18:04:14 GMT
patches accepted :-)

D

On Tue, Mar 1, 2011 at 10:02 AM, Dan Brickley <danbri@danbri.org> wrote:

> On 1 March 2011 17:56, Dmitriy Ryaboy <dvryaboy@gmail.com> wrote:
> > Hi Dan,
> > iirc, registering a jar does not put it on the Pig client classpath, it
> just
> > tells Pig to ship the jar. You want to put it on the PIG_CLASSPATH before
> > you invoke pig.
>
> Perfect, that was exactly it. It's running now :)
>
> Would it make sense for REGISTER to augment the classpath? Or maybe
> better, for the error message to mention the role of PIG_CLASSPATH?
>
> cheers,
>
> Dan
>
> > On Tue, Mar 1, 2011 at 5:57 AM, Dan Brickley <danbri@danbri.org> wrote:
> >>
> >> I'm trying to use InvokeForString to call a simple static method that
> >> wraps http://mzsanford.github.com/twitter-text-java/docs/api/index.html
> >> https://github.com/twitter/twitter-text-java ... specifically the
> >> Extractor class extractURLs method.  In fact since the logical result
> >> is a list of URLs perhaps I should be writing proper Pig-centric
> >> wrapper that returns a tuple, but for now I thought a stringified list
> >> would be ok for my immediate purposes. That purpose being pulling out
> >> all the URLs from a corpus of tweets, so we can expand the bit.ly and
> >> other short urls...
> >>
> >> So - I built the extra class (src below) and packaged it inside the
> >> twitter-text jar, and verify it's in there and usable as follows:
> >>
> >> danbri$ java -cp
> >> twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar
> >> tv.notube.TwitterExtractor "hello http://example.com/
> >> http://example.org/ world"
> >> URLs: [http://example.com/, http://example.org/]
> >>
> >> Then from the same directory, I try run this as a Pig job:
> >>
> >> tw06 = load '/user/danbri/twitter/tweets2009-06.tab.txt.lzo' AS (
> >> when: chararray, who: chararray, msg: chararray);
> >> REGISTER twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar;
> >> DEFINE ExtractURLs InvokeForString('tv.notube.TwitterExtractor.urls',
> >> 'String');
> >> urls = FOREACH tw06 GENERATE ExtractURLs(msg);
> >> x = SAMPLE urls 0.001;
> >> dump x;
> >>
> >> ...but we don't get past InvokeForString,
> >>
> >> 2011-03-01 14:50:31,033 [main] ERROR org.apache.pig.tools.grunt.Grunt
> >> - ERROR 1000: Error during parsing. could not instantiate
> >> 'InvokeForString' with arguments '[tv.notube.TwitterExtractor.urls,
> >> String]'
> >> Details at logfile: /home/danbri/twitter/pig_1298987430385.log
> >> ...->
> >> Caused by: java.lang.reflect.InvocationTargetException
> >> Caused by: java.lang.ClassNotFoundException: tv.notube.TwitterExtractor
> >>
> >> I checked that Pig is finding the jar by mis-spelling the filename in
> >> the "REGISTER" line (which as expected causes things to fail earlier).
> >> Also double-check that the class is in the jar,
> >> danbri$ jar -tvf
> >> twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar | grep tv
> >>     0 Tue Mar 01 12:03:04 CET 2011 tv/
> >>     0 Tue Mar 01 12:03:04 CET 2011 tv/notube/
> >>  1114 Tue Mar 01 13:40:30 CET 2011 tv/notube/TwitterExtractor.class
> >>
> >> ...so I'm finding myself stuck. I'm sure the answer is staring me in
> >> the face, but I can't see it. Perhaps I should just do things properly
> >> with "extends EvalFunc<String>" and return the tuples separately
> >> anyway...
> >>
> >> Thanks for any pointers,
> >>
> >> Dan
> >>
> >>
> >> package tv.notube;
> >> import com.twitter.Extractor;
> >> import java.util.List;
> >> class TwitterExtractor {
> >>
> >>  public static void main (String[] args) {
> >>    String in = args[0];
> >>        System.out.println("URLs: " + urls(in));
> >>  }
> >>
> >>  public static String urls(String tweet) {
> >>    Extractor ex = new Extractor();
> >>    List urls = ex.extractURLs(tweet);
> >>    String o = urls.toString();
> >>    return o;
> >>  }
> >> }
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message