hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6335) Support reading of concatenated gzip and bzip2 files
Date Wed, 26 May 2010 18:00:53 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871854#action_12871854
] 

David Ciemiewicz commented on HADOOP-6335:
------------------------------------------

@Chris Douglas

Yes, I have verified that in the version of hadoop that I am using, this is not fixed: Hadoop
0.20.10.0.1004192217

I cannot speak as to whether or not this is fixed in the trunk.

I created two files file1.bz2 and file2.bz2 and concatenated them into file12.bz2

-bash-3.1$ bzcat file12.bz2
contents of file1.bz2
contents of file2.bz2

I then run a simple pig script to dump the contents of this file:

-bash-3.1$ cat concat.pig
A = load 'file12.bz2' using PigStorage();
dump A;

The output below shows that only the first file in the concatenation is read. The subsequent
file is not read.

-bash-3.1$ pig -Dmapred.job.queue.name=... concat.pig
USING: /grid/0/gs/pig/current
2010-05-26 17:54:06,501 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/ciemo/.../pig_1274896446499.log
2010-05-26 17:54:06,750 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to hadoop file system at: hdfs://...:8020
2010-05-26 17:54:07,001 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to map-reduce job tracker at: ...:50300
2010-05-26 17:54:07,830 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-05-26 17:54:07,830 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-05-26 17:54:08,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2010-05-26 17:54:08,835 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2010-05-26 17:54:09,834 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Cannot get jobid for this job
2010-05-26 17:54:32,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-05-26 17:55:09,412 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-05-26 17:55:09,412 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in: "hdfs://...
2010-05-26 17:55:11,158 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : 1
2010-05-26 17:55:11,159 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : 34
2010-05-26 17:55:11,159 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
(contents of file1.bz2)

The dump should have shown both file1.bz2 and file2.bz2



> Support reading of concatenated gzip and bzip2 files
> ----------------------------------------------------
>
>                 Key: HADOOP-6335
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6335
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ravi Gummadi
>
> GzipCodec.GzipInputStream needs to support reading of concatenated gzip files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message