hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-570) Large BZip files Seem to loose data in Pig
Date Wed, 07 Jan 2009 01:39:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-570:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed; thanks, Ben!

> Large BZip files  Seem to loose data in Pig
> -------------------------------------------
>
>                 Key: PIG-570
>                 URL: https://issues.apache.org/jira/browse/PIG-570
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch, 0.0.0, 0.1.0, site
>         Environment: Pig 0.1.1/Linux / 8 Nodes hadoop 0.18.2
>            Reporter: Alex Newman
>             Fix For: types_branch, 0.1.0, site, 0.0.0
>
>         Attachments: bzipTest.bz2, PIG-570.patch
>
>
> So I don't believe  bzip2 input to pig is working, at least not with large files. It
seems as though map files are getting cut off. The maps complete way too quickly and the actual
row of data that pig tries to process often randomly gets cut, and becomes incomplete. Here
are my symptoms:
> - Maps seem to be completing in a unbelievably fast rate
> With uncompressed data
> Status: Succeeded
> Started at: Wed Dec 17 21:31:10 EST 2008
> Finished at: Wed Dec 17 22:42:09 EST 2008
> Finished in: 1hrs, 10mins, 59sec
> map	100.00%
> 4670	0	0	4670	0	0 / 21
> reduce	57.72%
> 13	0	0	13	0	0 / 4
> With bzip compressed data
> Started at: Wed Dec 17 21:17:28 EST 2008
> Failed at: Wed Dec 17 21:17:52 EST 2008
> Failed in: 24sec
> Black-listed TaskTrackers: 2
> Kind	% Complete	Num Tasks	Pending	Running	Complete	Killed	Failed/Killed
> Task Attempts
> map	100.00%
> 183	0	0	15	168	54 / 22
> reduce	100.00%
> 13	0	0	0	13	0 / 0
> The errors we get:
> ava.lang.IndexOutOfBoundsException: Requested index 11 from tuple (rec	A, 0HAW, CHIX,
)
> 	at org.apache.pig.data.Tuple.getField(Tuple.java:176)
> 	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84)
> 	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
> 	at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:223)
> 	at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
> 	at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:60)
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:117)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> Last 4KB
> attempt_200812161759_0045_m_000007_0	task_200812161759_0045_m_000007	tsdhb06.factset.com
FAILED	
> java.lang.IndexOutOfBoundsException: Requested index 11 from tuple (rec	A, CSGN, VTX,
VTX, 0, 20080303, 90919, 380, 1543, 206002)
> 	at org.apache.pig.data.Tuple.getField(Tuple.java:176)
> 	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84)
> 	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
> 	at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:223)
> 	at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
> 	at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:60)
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:117)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message