Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 52039 invoked from network); 13 Aug 2010 19:31:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Aug 2010 19:31:57 -0000 Received: (qmail 92202 invoked by uid 500); 13 Aug 2010 19:31:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92144 invoked by uid 500); 13 Aug 2010 19:31:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92136 invoked by uid 99); 13 Aug 2010 19:31:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Aug 2010 19:31:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stuhood@mailtrust.com designates 207.97.245.151 as permitted sender) Received: from [207.97.245.151] (HELO smtp151.iad.emailsrvr.com) (207.97.245.151) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Aug 2010 19:31:50 +0000 Received: from relay25.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay25.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 4E85E1B409B for ; Fri, 13 Aug 2010 15:31:29 -0400 (EDT) Received: from dynamic4.wm-web.iad.mlsrvr.com (dynamic4.wm-web.iad.mlsrvr.com [192.168.2.153]) by relay25.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 498621B4013 for ; Fri, 13 Aug 2010 15:31:29 -0400 (EDT) Received: from mailtrust.com (localhost [127.0.0.1]) by dynamic4.wm-web.iad.mlsrvr.com (Postfix) with ESMTP id 352E61D48070 for ; Fri, 13 Aug 2010 15:31:29 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: stuhood@mailtrust.com, from: stu.hood@rackspace.com) with HTTP; Fri, 13 Aug 2010 14:31:29 -0500 (CDT) Date: Fri, 13 Aug 2010 14:31:29 -0500 (CDT) Subject: Re: Cassandra and Pig From: "Stu Hood" To: user@cassandra.apache.org MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: quoted-printable Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: References: <1281718557.932823100@192.168.2.227> Message-ID: <1281727889.216219404@192.168.2.227> X-Mailer: webmail8 Hmm, the example code there may not have been run in distributed mode recen= tly, or perhaps Pig performs some magic to automatically register Jars cont= aining classes directly referenced as UDFs.=0A=0A-----Original Message-----= =0AFrom: "Christian Decker" =0ASent: Friday, Au= gust 13, 2010 12:16pm=0ATo: user@cassandra.apache.org=0ASubject: Re: Cassan= dra and Pig=0A=0AWow, that was extremely quick, thanks Stu :-)=0AI'm still = a bit unclear on what the pig_cassandra script does. It sets some=0Avariabl= es (PIG_CLASSPATH for one) and then starts the original pig binary=0Abut in= jects some libraries in it (libthrift and pig-core) but strangely not=0Athe= cassandra loadfunc, why not?=0A=0AAnyway now I understand why I was gettin= g different errors when executing=0Adirectly via Pig compared to through pi= g_cassandra. Still I get an exception=0Awhich I cannot explain where it com= es from (http://pastebin.com/JYfSSfny):=0A=0ACaused by: java.lang.RuntimeEx= ception: Could not resolve error that occured=0Awhen launching map reduce j= ob: java.lang.ExceptionInInitializerError=0A at=0Aorg.apache.pig.backend.ha= doop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExcep= tionHandler.uncaughtException(MapReduceLauncher.java:510)=0A at java.lang.T= hread.dispatchUncaughtException(Thread.java:1845)=0A=0A=0AAny idea? Thanks = again for your fast answer :)=0A=0AOn Fri, Aug 13, 2010 at 6:55 PM, Stu Hoo= d wrote:=0A=0A> That error is coming from the fron= tend: the jars must also be on the local=0A> classpath. Take a look at how = contrib/pig/bin/pig_cassandra sets up=0A> $PIG_CLASSPATH.=0A>=0A> -----Orig= inal Message-----=0A> From: "Christian Decker" = =0A> Sent: Friday, August 13, 2010 11:30am=0A> To: user@cassandra.apache.or= g=0A> Subject: Cassandra and Pig=0A>=0A> Hi all,=0A>=0A> I'm trying to get = Pig to read data from a Cassandra cluster, which I=0A> thought=0A> trivial = since Cassandra already provides me with the CassandraStorage=0A> class.=0A= > Problem is that once I try executing a simple script like this:=0A>=0A> r= egister /path/to/pig-0.7.0-core.jar;register=0A> /path/to/libthrift-r917130= .jar;=0A> register /path/to/cassandra_loadfunc.jarrows =3D LOAD=0A> 'cassan= dra://Keyspace1/Standard1' USING=0A> org.apache.cassandra.hadoop.pig.Cassan= draStorage();cols =3D FOREACH rows=0A> GENERATE flatten($1);colnames =3D FO= REACH cols GENERATE $0;namegroups =3D=0A> GROUP colnames BY $0;namecounts = =3D FOREACH namegroups GENERATE=0A> COUNT($1), group;orderednames =3D ORDER= namecounts BY $0;topnames =3D=0A> LIMIT orderednames 50;dump topnames;=0A>= =0A> I just end up with a NoClassDefFoundError:=0A>=0A> ERROR org.apache.pi= g.tools.grunt.Grunt -=0A> org.apache.pig.impl.logicalLayer.FrontendExceptio= n: ERROR 1066: Unable to=0A> open iterator for alias topnames=0A> at org.ap= ache.pig.PigServer.openIterator(PigServer.java:521)=0A> at=0A> org.apache.= pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)=0A> at=0A>=0A= > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPars= er.java:241)=0A> at=0A>=0A> org.apache.pig.tools.grunt.GruntParser.parseSt= opOnError(GruntParser.java:162)=0A> at=0A>=0A> org.apache.pig.tools.grunt.G= runtParser.parseStopOnError(GruntParser.java:138)=0A> at org.apache.pig.to= ols.grunt.Grunt.exec(Grunt.java:89)=0A> at org.apache.pig.Main.main(Main.ja= va:391)=0A> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: = ERROR 1002:=0A> Unable to store alias topnames=0A> at org.apache.pig.PigSe= rver.store(PigServer.java:577)=0A> at org.apache.pig.PigServer.openIterator= (PigServer.java:504)=0A> ... 6 more=0A> Caused by: org.apache.pig.backend.= executionengine.ExecException: ERROR=0A> 2117:=0A> Unexpected error when la= unching map reduce job.=0A> at=0A>=0A> org.apache.pig.backend.hadoop.execut= ionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java= :209)=0A> at=0A>=0A> org.apache.pig.backend.hadoop.executionengine.HExecut= ionEngine.execute(HExecutionEngine.java:308)=0A> at org.apache.pig.PigServe= r.executeCompiledLogicalPlan(PigServer.java:835)=0A> at org.apache.pig.Pig= Server.store(PigServer.java:569)=0A> ... 7 more=0A> Caused by: java.lang.Ru= ntimeException: Could not resolve error that occured=0A> when launching map= reduce job: java.lang.NoClassDefFoundError:=0A> org/apache/thrift/TBase=0A= > at=0A>=0A> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.= MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapRed= uceLauncher.java:510)=0A> at java.lang.Thread.dispatchUncaughtException(Th= read.java:1845)=0A>=0A> I cannot think of a reason as to why. As far as I u= nderstood it Pig takes=0A> the jar files in the script, unpackages them, cr= eates the execution plan=0A> for=0A> the script itself and then bundles it = into a single jar again, then submits=0A> it to the HDFS from where it will= be executed in Hadoop, right?=0A> I also checked that the class in questio= n actually is in the libthrift jar,=0A> so what's going wrong?=0A>=0A> Rega= rds,=0A> Chris=0A>=0A>=0A>=0A