pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Coward <k...@melon.org>
Subject Re: Problems loading a datafile..
Date Thu, 03 Mar 2011 04:46:02 GMT

Yep. That did it. Now if you don't mind my asking, is there any way to
direct LzoTokenizedStorage to put that extension on the part files when
it's writing them in the first place?

-K

On Wed, Mar 02, 2011 at 03:17:09PM -0800, Dmitriy Ryaboy wrote:
> Oh.
> Yea we expect LZO files to have a .lzo extension.
> 
> D
> 
> On Wed, Mar 2, 2011 at 12:16 PM, Kris Coward <kris@melon.org> wrote:
> 
> >
> > I might still be missing something useful (we're running elephant-bird
> > from the gpl-packing distribution, and I've registered most of the
> > jarfiles from it), but the strack trace has changed a little, so now
> > it's producing:
> >
> > Backend error message during job submission
> > -------------------------------------------
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> > create input slice for: hdfs://master.hadoop:9000/hadooptest/lzofile
> >        at
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
> >        at
> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> >        at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> >        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> >        at
> > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >        at
> > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >        at java.lang.Thread.run(Thread.java:662)
> > Caused by: org.apache.pig.PigException: ERROR 0: no files found a path
> > hdfs://master.hadoop:9000/hadooptest/lzofile
> >        at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.slice(Unknown
> > Source)
> >        at
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:260)
> >        ... 7 more
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 2997: Unable to recreate exception from backend error:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> > create input slice for: hdfs://master.hadoop:9000/hadooptest/lzofile
> >
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias test4
> >         at org.apache.pig.PigServer.openIterator(PigServer.java:482)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
> >        at
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> >        at org.apache.pig.Main.main(Main.java:352)
> > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2997: Unable to recreate exception from backend error:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> > create input slice for: hdfs://master.hadoop:9000/hadooptest/lzofile
> >         at
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:176)
> >        at
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:253)
> >        at
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
> >        at
> > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:781)
> >        at org.apache.pig.PigServer.store(PigServer.java:529)
> >        at org.apache.pig.PigServer.openIterator(PigServer.java:465)
> >        ... 6 more
> >
> > ================================================================================
> >
> > The "ERROR 0: no files found a path
> > hdfs://master.hadoop:9000/hadooptest/lzofile"
> > message has me really puzzled because in grunt I can see the files, I
> > can copy them to local, I can rename them with .lzo on the end,
> > uncompress them, and see the data that I expect, and I can even load
> > them with PigLoader (though obviously the data's all wrong when I do
> > that).
> >
> > Any more tips?
> >
> > Thanks,
> > Kris
> >
> > On Wed, Mar 02, 2011 at 09:32:47AM -0800, Dmitriy Ryaboy wrote:
> > > Off the top of my head, I can't think of anything, but you can just grab
> > > everything in Elephant-Bird's lib/ directory and make sure it's on the
> > > classpath on all the task trackers and your client machine (you can
> > > propagate it to the TTs via the register keyword if you don't want to bug
> > > your hadoop sysadmin and restart things).
> > >
> > > D
> > >
> > > On Wed, Mar 2, 2011 at 9:25 AM, Kris Coward <kris@melon.org> wrote:
> > >
> > > >
> > > > Nope; they're reproduced across all the machines. Does the
> > > > LzoTokenizedLoader class have any dependencies that LzoTokenizedStorage
> > > > doesn't (which I may be overlooking)?
> > > >
> > > > -K
> > > >
> > > > On Tue, Mar 01, 2011 at 07:17:10PM -0500, Kris Coward wrote:
> > > > >
> > > > > What's peculiar is that the test script for the loader class that
was
> > > > > run a week ago seems also to be failing with the same error. We've
> > added
> > > > > nodes to the cluster; maybe the relevant .jar files haven't been
> > copied
> > > > > over to those nodes. I'll bug our sysadmin about that..
> > > > >
> > > > > Thanks,
> > > > > Kris
> > > > >
> > > > > On Tue, Mar 01, 2011 at 02:08:32PM -0800, Dmitriy Ryaboy wrote:
> > > > > > Kris,
> > > > > > Check the pig log file. Often "unable to create input slice"
is
> > caused
> > > > by
> > > > > > errors such as not being able to find your loader class, or
some
> > > > dependency
> > > > > > of your loader class.
> > > > > >
> > > > > > D
> > > > > >
> > > > > > On Tue, Mar 1, 2011 at 1:48 PM, Kris Coward <kris@melon.org>
> > wrote:
> > > > > >
> > > > > > >
> > > > > > > I get the output:
> > > > > > >
> > > > > > > rw-r--r--   2 kris supergroup     172694 2011-02-25 01:59
> > > > > > > /path/to/file/item/ex/subdir
> > > > > > >
> > > > > > > -K
> > > > > > >
> > > > > > > On Tue, Mar 01, 2011 at 12:46:31PM -0800, Dmitriy Ryaboy
wrote:
> > > > > > > > What happens when you "hadoop fs -lsr" those paths?
> > > > > > > >
> > > > > > > > D
> > > > > > > >
> > > > > > > > On Sun, Feb 27, 2011 at 7:47 PM, Kris Coward <kris@melon.org>
> > > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > So I finally got a couple of test scripts running
on my
> > cluster
> > > > to take
> > > > > > > > > a sample data file, load it, do a little processing,
store
> > it,
> > > > load it,
> > > > > > > > > do a little more processing, and dump the results.
> > > > > > > > >
> > > > > > > > > Once these were working, I set to parsing and
storing some
> > real
> > > > data,
> > > > > > > > > but when got an "Unable to create input slice"
error when
> > trying
> > > > to
> > > > > > > load
> > > > > > > > > this data back out again. This happened with
each of:
> > > > > > > > >
> > > > > > > > > foo = LOAD '/path/to/file/{item,list,glob}/*/subdir'
USING
> > > > > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader(',')
AS
> > > > > > > (schema:...);
> > > > > > > > > foo = LOAD '/path/to/file/item/*/subdir' USING
> > > > > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader(',')
AS
> > > > > > > (schema:...);
> > > > > > > > > foo = LOAD '/path/to/file/item/ex/subdir' USING
> > > > > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader(',')
AS
> > > > > > > (schema:...);
> > > > > > > > >
> > > > > > > > > and yielded the error (the same each time, except
for the
> > > > name/glob
> > > > > > > > > used):
> > > > > > > > >
> > > > > > > > > ERROR 2997: Unable to recreate exception from
backend error:
> > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
ERROR
> > 2118:
> > > > > > > Unable to
> > > > > > > > > create input slice for:
> > > > > > > > > hdfs://master.hadoop:9000//path/to/file/item/ex/subdir
> > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException:
ERROR
> > 1066:
> > > > Unable
> > > > > > > to
> > > > > > > > > open iterator for alias foo
> > > > > > > > >        at
> > > > org.apache.pig.PigServer.openIterator(PigServer.java:482)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> > > > > > > > >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> > > > > > > > >        at org.apache.pig.Main.main(Main.java:352)
> > > > > > > > > Caused by:
> > org.apache.pig.backend.executionengine.ExecException:
> > > > ERROR
> > > > > > > > > 2997: Unable to recreate exception from backend
error:
> > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
ERROR
> > 2118:
> > > > > > > Unable to
> > > > > > > > > create input slice for:
> > > > > > > > > hdfs://master.hadoop:9000/path/to/file/item/ex/subdir
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:176)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:253)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
> > > > > > > > >        at
> > > > > > > > >
> > > > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:781)
> > > > > > > > >        at org.apache.pig.PigServer.store(PigServer.java:529)
> > > > > > > > >        at
> > > > org.apache.pig.PigServer.openIterator(PigServer.java:465)
> > > > > > > > >        ... 6 more
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Anyone have any suggestions why this may be happening
and how
> > to
> > > > fix
> > > > > > > it?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Kris
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Kris Coward
> > > > > > > http://unripe.melon.org/
> > > > > > > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E
21A4 05C7
> > 1FEB
> > > > 12B3
> > > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Kris Coward
> > > > http://unripe.melon.org/
> > > > > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7
1FEB
> > 12B3
> > > > > > >
> > > > >
> > > > > --
> > > > > Kris Coward
> > http://unripe.melon.org/
> > > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> > > >
> > > > --
> > > > Kris Coward
> > http://unripe.melon.org/
> > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> > > >
> >
> > --
> > Kris Coward                                     http://unripe.melon.org/
> > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> >

-- 
Kris Coward					http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3

Mime
View raw message