pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2391) Bzip_2 test is broken
Date Tue, 06 Dec 2011 22:08:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163899#comment-13163899
] 

Olga Natkovich commented on PIG-2391:
-------------------------------------

Looks like our comments crossed. So the issue is that Hadoop does not understand .bz extension
and you need to fake it by saying it is actually bz2.
                
> Bzip_2 test is broken
> ---------------------
>
>                 Key: PIG-2391
>                 URL: https://issues.apache.org/jira/browse/PIG-2391
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10
>            Reporter: Olga Natkovich
>            Assignee: xuting zhao
>             Fix For: 0.10, 0.11
>
>         Attachments: PIG-2391.patch
>
>
> This test is currently commented out but if you uncomment it it fails with Pig 10 but
runs successfully with Pig 9.
> Script:
> a = load '/homes/olgan/studenttab10k' using PigStorage() as (name, age, gpa);
> store a into 'intermediate.bz';
> b = load 'intermediate.bz';
> store b into 'final.bz';
> A couple of observations:
> (1) Identical script (represented by Bzip_1 test) that has bz2 instead of bz extension
in the script succeeds in Pig 10
> (2) The problem occurs while reading intermediate.bz which has different size with Pig
9 and Pig 10
> (3) Problem can be reproduced in local mode with small subset of data in the file
> (4) The following stack trace is observed:
> 2011-12-01 13:53:12,280 [Thread-22] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> java.lang.RuntimeException: java.io.IOException: compressedStream EOF
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:237)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.<init>(PigRecordReader.java:109)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:119)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.io.IOException: compressedStream EOF
>         at org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:92)
>         at org.apache.tools.bzip2r.CBZip2InputStream.compressedStreamEOF(CBZip2InputStream.java:96)
>         at org.apache.tools.bzip2r.CBZip2InputStream.bsR(CBZip2InputStream.java:451)
>         at org.apache.tools.bzip2r.CBZip2InputStream.initBlock(CBZip2InputStream.java:348)
>         at org.apache.tools.bzip2r.CBZip2InputStream.<init>(CBZip2InputStream.java:220)
>         at org.apache.pig.bzip2r.Bzip2TextInputFormat$BZip2LineRecordReader.<init>(Bzip2TextInputFormat.java:105)
>         at org.apache.pig.bzip2r.Bzip2TextInputFormat.createRecordReader(Bzip2TextInputFormat.java:244)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:227)
>         ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message