incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Decker <decker.christ...@gmail.com>
Subject Re: Cassandra and Pig
Date Sun, 15 Aug 2010 20:44:49 GMT
I'm using Cassandra 0.6.3 but plan on switching to 0.7.0 later. While
compiling I have a copy of the storage-conf.xml from the running cluster :-)

On Fri, Aug 13, 2010 at 9:51 PM, Stu Hood <stu.hood@rackspace.com> wrote:

> > Still I get an exception which I cannot explain where it comes
> > from (http://pastebin.com/JYfSSfny)
> Which version of Cassandra are you using? The 0.6 series requires that a
> valid storage-conf.xml is distributed with the job to specify
> connection/partitioner/etc information, but trunk/0.7-beta2 requires
> properties to be set by your startup script.
>
> -----Original Message-----
> From: "Stu Hood" <stu.hood@rackspace.com>
> Sent: Friday, August 13, 2010 2:31pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra and Pig
>
> Hmm, the example code there may not have been run in distributed mode
> recently, or perhaps Pig performs some magic to automatically register Jars
> containing classes directly referenced as UDFs.
>
> -----Original Message-----
> From: "Christian Decker" <decker.christian@gmail.com>
> Sent: Friday, August 13, 2010 12:16pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra and Pig
>
> Wow, that was extremely quick, thanks Stu :-)
> I'm still a bit unclear on what the pig_cassandra script does. It sets some
> variables (PIG_CLASSPATH for one) and then starts the original pig binary
> but injects some libraries in it (libthrift and pig-core) but strangely not
> the cassandra loadfunc, why not?
>
> Anyway now I understand why I was getting different errors when executing
> directly via Pig compared to through pig_cassandra. Still I get an
> exception
> which I cannot explain where it comes from (http://pastebin.com/JYfSSfny):
>
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.ExceptionInInitializerError
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
>
> Any idea? Thanks again for your fast answer :)
>
> On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <stu.hood@rackspace.com> wrote:
>
> > That error is coming from the frontend: the jars must also be on the
> local
> > classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up
> > $PIG_CLASSPATH.
> >
> > -----Original Message-----
> > From: "Christian Decker" <decker.christian@gmail.com>
> > Sent: Friday, August 13, 2010 11:30am
> > To: user@cassandra.apache.org
> > Subject: Cassandra and Pig
> >
> > Hi all,
> >
> > I'm trying to get Pig to read data from a Cassandra cluster, which I
> > thought
> > trivial since Cassandra already provides me with the CassandraStorage
> > class.
> > Problem is that once I try executing a simple script like this:
> >
> > register /path/to/pig-0.7.0-core.jar;register
> > /path/to/libthrift-r917130.jar;
> > register /path/to/cassandra_loadfunc.jarrows = LOAD
> > 'cassandra://Keyspace1/Standard1' USING
> > org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> > GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> > GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> > COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> > LIMIT orderednames 50;dump topnames;
> >
> > I just end up with a NoClassDefFoundError:
> >
> > ERROR org.apache.pig.tools.grunt.Grunt -
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias topnames
> > at org.apache.pig.PigServer.openIterator(PigServer.java:521)
> >  at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> > at
> >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> >  at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> > at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> >  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> > at org.apache.pig.Main.main(Main.java:391)
> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1002:
> > Unable to store alias topnames
> >  at org.apache.pig.PigServer.store(PigServer.java:577)
> > at org.apache.pig.PigServer.openIterator(PigServer.java:504)
> >  ... 6 more
> > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2117:
> > Unexpected error when launching map reduce job.
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
> >  at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> > at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
> >  at org.apache.pig.PigServer.store(PigServer.java:569)
> > ... 7 more
> > Caused by: java.lang.RuntimeException: Could not resolve error that
> occured
> > when launching map reduce job: java.lang.NoClassDefFoundError:
> > org/apache/thrift/TBase
> >  at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
> >  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
> >
> > I cannot think of a reason as to why. As far as I understood it Pig takes
> > the jar files in the script, unpackages them, creates the execution plan
> > for
> > the script itself and then bundles it into a single jar again, then
> submits
> > it to the HDFS from where it will be executed in Hadoop, right?
> > I also checked that the class in question actually is in the libthrift
> jar,
> > so what's going wrong?
> >
> > Regards,
> > Chris
> >
> >
> >
>
>
>
>
>

Mime
View raw message