hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nigel Daley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1694) lzo compressed input files not properly recognized
Date Wed, 08 Aug 2007 18:01:14 GMT
lzo compressed input files not properly recognized
--------------------------------------------------

                 Key: HADOOP-1694
                 URL: https://issues.apache.org/jira/browse/HADOOP-1694
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.14.0
            Reporter: Nigel Daley
             Fix For: 0.15.0


When running the wordcount example with text, gzip and lzo compressed input files, the lzo
compressed input files are not properly recognized and are treated as text files.

With an input dir of
    /user/hadoopqa/input/part-001.txt
    /user/hadoopqa/input/part-002.txt.gz
    /user/hadoopqa/input/part-003.txt.lzo
and running this command
    bin/hadoopqa jar hadoop-examples.jar wordcount /user/hadoopqa/input /user/hadoopqa/output
I get output that looks like
    row     4
    royal   4
    rt$3-ex?ÔøΩ?÷µIStÔøΩ"4D%ÔøΩ9$UÔøΩÔøΩ"ÔøΩ,       1
    ru$ÔøΩÔøΩ#~t"@ÔøΩm*d#\/$ÔøΩÔøΩl.t"XÔøΩÔøΩDi"    1
    rubbÔøΩdÔøΩ&@bT 1
    rubbed  2

To lzo compress the file I used lzop:
http://www.lzop.org/download/lzop-1.01-linux_i386.tar.gz


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message