hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1304) Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input
Date Wed, 17 Mar 2010 18:41:27 GMT
Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input
-----------------------------------------------------------------------------------

                 Key: PIG-1304
                 URL: https://issues.apache.org/jira/browse/PIG-1304
             Project: Pig
          Issue Type: New Feature
    Affects Versions: 0.6.0
            Reporter: Viraj Bhat


I have the following txt files which are bzipped: \t =<TAB> 
{code}
$ bzcat A.txt.bz2 
1\ta
2\taa

$bzcat B.txt.bz2
1\tb
2\tbb

$cat *.bz2 > test/mymerge.bz2
$bzcat test/mymerge.bz2 
1\ta
2\taa
1\tb
2\tbb

$hadoop fs -put test/mymerge.bz2 /user/viraj

{code}

I now write a Pig script to print values of bz2.

{code}
A = load '/user/viraj/bzipgetmerge/mymerge.bz2' using PigStorage();
dump A;
{code}

I get the records for the first bz2 file which I concatenated.

(1,a)
(2,aa)

My M/R jobs do not fail or throw any warning about this, just that it drops records. Is there
a way we can throw a warning or fail the underlying Map job, can it be done in Bzip2TextInputFormat
class in Pig ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message