hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuri Pradkin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-477) Support for reading bzip2 compressed file created using concatenation of multiple .bz2 files
Date Wed, 26 May 2010 22:41:44 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871993#action_12871993
] 

Yuri Pradkin commented on MAPREDUCE-477:
----------------------------------------

Just tried this on our cluster:
    echo "content1" | bzip2 - >foo.bz2
    echo "content2" | bzip2 - >>foo.bz2
     bzcat foo.bz2
    {quote}
    content1
    content2
    {quote}
    hdfs -put foo.bz2 foo.bz2
    hadoop jar .../hadoop-streaming.jar -input foo.bz2 -output foo -mapper /bin/cat  -reducer
/bin/cat

This completes after scheduling some rediculous number of splits (98)

    hdfs -getmerge foo foo
    cat foo
    {quote}
    content1
    content2
    {quote}

mapreduce/common: trunk rev 897063


> Support for reading bzip2 compressed file created using concatenation of multiple .bz2
files 
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-477
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-477
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Suhas Gogate
>            Priority: Minor
>
> Bzip2Codec supported in Hadoop 0.19/0.20  should support for reading bzip2 compressed
file created using concatenation of multiple .bz2 files 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message